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The  work  described  here  demonstrates  the  practicality  of  routine  text-to-speech  translation. 
A set  of  329  letter-to-sound  rules  has  been  developed.  These  translate  English  text  into  the 
International  Phonetic  Alphabet  (IP A),  producing  correct  pronunciations  for  approximately  90% 
of  the  words  in  an  average  text  sample.  Most  of  the  remaining  10%  have  single  errors  easily 
correctable  by  the  listener.  Another  set  of  rules  translates  IPA  into  the  phonetic  coding  for  a 
particular  commercial  speech  synthesizer. 

This  report  describes  the  technical  approach  used  and  the  support  hardware  and  software 
developed.  It  gives  overall  performance  figures,  detailed  statistics  showing  the  importance  of 
each  rule,  and  listings  of  a translation  program  and  a program  used  in  rule  development. 
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AUTOMATIC  TRANSLATION  OF  ENGLISH  TEXT  TO  PHONETICS 
BY  MEANS  OF  LETTER-TO-SOUND  RULES 


INTRODUCTION 

Hardware  to  produce  synthetic  speech  existed  in  various  forms  as  early  as  1939.  At 
the  New  York  World’s  Fair  in  that  year,  Homer  Dudley  exhibited  his  Synthetic  Speaker 
[1] , the  ancestor  of  many  of  the  more  successful  speech  synthesizers  now  in  use.  Today 
phonetically  programmable  synthesizers  of  reasonable  intelligibility  are  commerically  avail- 
able for  a few  thousand  dollars.  Such  devices  have  stimulated  widespread  interest  in 
computer  voice  output  for  various  civilian  and  Department  of  Defense  (DoD)  applications. 

A further  impetus  to  DoD  interest  is  resulting  from  the  development  of  narrowband  digital 
voice-transmission  systems,  such  as  NRL’s  Linear  Predictive  Coder  [2],  and  the  likelihood 
of  their  widespread  future  use.  These  speech-transmission  systems  include  a synthesizer 
that  could  also  be  used  for  computer  voice  output. 

Among  the  most  promising  applications  of  computer  voice  output  are: 

• ways  to  transmit  information  from  English-language  data  bases  to  remote  locations 
by  telephone, 

• a channel  of  communication  with  busy  operators  of  computer-controlled  systems 
who  have  to  give  most  of  their  attention  to  complicated  visual  displays  and  would 
find  extraneous  text  messages  intolerable,  and 

• reading  machines  for  the  blind. 

In  such  applications  the  potential  utility  of  computer-controlled  speech  synthesizers  is 
greatly  enhanced  if  the  speech  is  not  restricted  to  a prestored  vocabulary. 

Among  the  numerous  approaches  to  providing  such  unrestricted  text-to-speech  trans- 
lation, the  simplest  is  to  use  a small  set  of  letter-to-sound  rules  to  guess  at  the  pronunciation 
of  any  word.  Each  rule  specifies  a phonetic  correspondence  to  one  or  more  letters.  In 
some  cases  the  letter’s  context  is  used  to  determine  which  rule  should  be  applied.  An 
example  is  the  elementary  school  rule  “when  two  vowels  go  walking,  the  first  one  does 
the  talking,”  which  indicates  that  when  one  vowel  is  followed  by  another,  the  first  is 
transcribed  into  the  long  vowel  phoneme  whereas  the  second  vowel  is  silent  and  receives  no 
phonetic  symbol.  In  other  cases  no  context  is  necessary,  as  with  the  letter  j,  which 
usually  receives  the  /d3  / phoneme.  (The  International  Phonetic  Alphabet  (IPA)*  will  be 
used  to  denote  English  phonemes  and  indicate  pronunciations.) 

A more  complicated  approach,  and  one  requiring  much  more  storage,  uses  a large 
pronunciation  dictionary  supplemented  by  various  sets  of  rules.  Words  are  isolated  from 
the  text  and  looked  up  in  the  dictionary.  If  the  lookup  fails,  various  rules  are  used  to  break 
the  word  into  constituent  parts  for  which  there  are  dictionary  entries.  Finally,  if  all  else 
fails,  letter-to-sound  rules  are  used  to  guess  at  the  pronunciation. 
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A yet  more  elaborate  approach  adds  syntactic  analysis  of  sentences  to  the  preceding  in 
order  to  determine  the  part  of  speech  of  each  word.  This  resolves  the  pronunciation 
ambiguities  of  words  like  approximate  (adjective  or  verb?)  and  house  (noun  or  verb?). 
Finally,  well  beyond  the  current  state  of  the  art,  one  could  imagine  an  approach  incorpora- 
ting a semantic  analysis  sophisticated  enough  to  decide  whether  unionized  refers  to  unions 
or  ionization. 

To  be  attractive  as  a routine  addition  to  computer  systems,  text-to-speech  translation 
cannot  require  a large  fraction  of  the  available  computational  resources.  This  constraint, 
which  is  particularly  strong  for  real-time  military  systems,  precludes  approaches  that  embody 
large  pronouncing  dictionaries  or  linguistic  analysis  programs.  Thus  routine  use  of  text-to- 
speech  translation  is  likely  only  if  sufficient  intelligibility  can  be  attained  with  a limited 
set  of  letter-to-sound  rules. 

We  report  here  on  work  that  has  demonstrated  the  practicality  of  routine  text-to- 
speech  translation.  We  have  developed  a set  of  329  letter-to-sound  rules  that  translate 
English  text  into  the  International  Phonetic  Alphabet  (IPA).  Using  the  50,000-word 
Standard  Corpus  of  Present-Day  Edited  American  English  (“Brown  Corpus”)  [3] , we  have 
determined  that  the  rules  will  produce  correct  pronunciations  for  approximately  90%  of 
the  words  in  an  average  sample  of  English  text.  Typically  the  remaining  10%  have  single 
errors  that  in  most  cases  can  easily  be  mentally  corrected  by  the  listener.  A separate  set 
of  rules  was  developed  to  translate  from  IPA  into  a phonetic  encoding  compatible  with  a 
particular  commercial  speech  synthesizer  (Federal  Screw  Works  Votrax  VS-6). 

In  the  next  section  we  discuss  previous  work  in  text-to-speech  translation.  The 
technical  approach  used  in  the  NRL  system  is  described  in  the  third  section  as  are  the 
support  hardware  and  software  that  we  developed.  Our  results  are  summarized  in  the 
fourth  section.  Together  with  overall  performance  figures,  we  give  detailed  statistics  that 
show  the  importance  of  each  rule.  Our  conclusions  and  our  plans  for  further  work  are 
discussed  in  the  fifth  section.  Descriptions  and  listings  of  two  SNOBOL  programs  that 
were  important  for  our  work  are  included  as  appendixes.  A third  appendix  contains  some 
remarks  on  the  improvement  in  these  programs’  performance  that  followed  our  changing 
from  an  interpreted  version  of  SNOBOL  to  a compiled  version,  FASBOL. 


SOME  EXISTING  TEXT-TO-SPEECH  SYSTEMS 

Text-to-speech  systems  have  been  built  ranging  in  complexity  from  letter-to-sound  rule 
systems  to  dictionary-lookup  systems  with  syntactic  analysis.  We  will  describe  three  briefly: 
those  developed  at  MIT,  the  University  of  Keele,  and  Bell  Telephone  Laboratories.  None 
that  we  encountered  however  completely  satisfy  all  the  criteria  we  imposed: 

• The  implementation  must  be  straightforward,  for  reasons  given  in  the  Introduction, 
requiring  little  space  for  the  program  and  none  at  all  for  large  dictionaries; 

• The  translation  rules  must  be  easily  modifiable,  both  to  allow  for  development  and 
improvement  of  the  rules  and  to  permit  the  system  to  be  tailored  to  u variety  of 
special  applications; 

• The  system  should  not  be  tied  to  a particular  hardware  synthesizer; 
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• There  should  be  an  objective  measure  of  the  system’s  performance. 


MIT  System 

Allen  and  Lee  have  reported  on  research  in  automatic  text  translation  at  the  Massa- 
chusetts Institute  of  Technology  [4-8] . The  MIT  system  not  only  confronts  the  text-to- 
speech  conversion  problem  but  attempts  to  read  printed  text  using  a character  recognizer. 
The  MIT  system  includes  a parts-of-speech  preprocessor  to  aid  in  the  pronunciation  of 
such  homographs  as  refuse,  appropriate,  and  lives.  After  parts-of-speech  analysis,  the  system, 
using  a phrase  analyzer  module,  assigns  such  prosodic  features  as  inflection  and  stress  to 
the  phonetic  transcription.  The  resulting  string  of  phonemes  and  prosodic  features  is 
transformed  to  the  signals  needed  to  operate  the  synthesizer,  designed  in  the  MIT  laboratory. 

The  grapheme-to-phoneme  translator  uses  a typical  dictionary-lookup  approach  with  a 
set  of  letter-to-sound  rules.  One  word  is  isolated  from  the  input  text  and  looked  up  in  a 
dictionary.  If  the  word  is  found  and  has  no  alternate  transcriptions,  the  result  is  passed  to 
the  phrase  analyzer,  assigned  prosodic  features,  and  passed  for  speech-synthesizer  param- 
etrization.  If  an  alternate  transcription  is  encountered,  the  parts-of-speech  information 
obtained  by  the  parts-of-speech  preprocessor  is  used  to  determine  which  transcription  is  to 
be  used.  This  result  is  then  passed  along  the  translation  chain. 

When  a word  is  not  found  in  the  dictionary,  an  attempt  is  made  to  partition  the  word 
into  morphs  and  isolate  affixes.  The  individual  morphs  are  then  looked  up  in  the  dictionary. 
If  they  are  found,  the  result  is  passed  along  for  stress  analysis  and  synthesizer  parame- 
trization  as  before.  When  all  else  fails,  the  set  of  letter-to-sound  rules  is  applied  to  the 
original  input  word. 

Currently  the  MIT  system  contains  a dictionary  of  11,000  words  and  a set  of  approxi- 
mately 400  letter-to-sound  rules  [9].  The  phrase  analyzer  does  not  parse  a sentence 
completely,  but  techniques  to  assign  prosodic  features  are  being  investigated.  Each  item 
in  the  dictionary  requires  parts-of-speech  information  and  alternate  transcriptions  along  with 
various  internal  flags.  Consequently  the  amount  of  external  computer  storage  can  grow  quite 
large.  Lee  estimates  that  a 32,000-word  dictionary  requires  approximately  4 million  bits  [4] . 
Additionally  the  internal  storage  for  such  a translation  program  could  become  quite  large  when 
new  features  such  as  syntax  analysis  and  prosodic  feature  assignment  are  added.  A comprehen- 
sive list  of  the  letter-to-sound  rules  has  not  been  published,  nor  has  a quantitative  evaluation 
of  the  system’s  performance. 


University  of  Keele  System 

The  system  developed  at  the  University  of  Keele  in  England  by  Ainsworth  [10]  is  a 
letter-to-sound- rule  system  that  converts  text  punched  on  paper  tape  to  symbols  used  to 
generate  parameters  to  control  a speech  synthesizer.  Ainsworth  does  the  translation  to 
speech  in  the  following  steps. 
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1.  Segmentation  into  breath  groups, 

2.  Translation  to  phonemes  via  letter-to*sound  rules, 

3.  Lexical  stress  assignment, 

4.  Speech  synthesizer  parametrization. 

Step  1 inserts  pauses  at  convenient  locations,  to  provide  more  natural  sounding  speech. 
A translation  buffer  of  about  50  characters  is  filled  until  a punctuation  mark  is  encountered. 
This  buffer  becomes  a breath  group.  If  the  buffer  is  filled  before  a punctuation  mark  is 
encountered,  the  buffer  is  search  for  a conjunction,  and  the  buffer  up  to  the  conjunction 
becomes  a breath  group.  If  a conjunction  does  not  occur,  an  auxiliary  verb,  a preposition, 
or  an  article  is  searched  for.  Otherwise  the  entire  contents  of  the  buffer  becomes  a breath 
group. 

Step  2 provides  the  translation  of  input  text  to  phonemes.  Ainsworth’s  rules  are 
intended  to  produce  a dialect  of  British  English.  These  rules  are  context  sensitive,  and  the 
order  of  their  application  is  critical.  For  example,  the  rules 

(o)ing  /3U/ 

and  (oi)  /oi/ 

occur  in  that  order  among  the  rules  for  translating  the  letter  o.  The  first  illustrates  context 
dependence;  it  states  that  o,  in  the  context  of  following  ing,  is  pronounced  as  /au/,  like  the 
o in  going  in  Ainsworth’s  dialect.  The  order  is  important  since  going  matches  both  rules. 

In  such  a case  the  first  matching  rule  is  used;  if  the  order  were  reversed,  the  oi  in  going 
would  be  transcribed  as  /oi/,  the  sound  of  the  oi  in  coin.  Ainsworth’s  rules  were  the  start- 
ing point  for  the  development  of  the  rules  used  by  the  NRL  system. 


Ainsworth  reports  performance  measures  based  on  1000- word  passages  from  three 
sources:  a textbook,  a novel,  and  a newspaper.  His  figures  show  92%  of  the  words  in  the 
first  sample  correctly  translated,  89%  in  *he  second,  and  89%  of  the  third.  Listening  tests 
using  the  same  three  passages  showed  scores  ranging  from  50%  to  90%  of  words  correctly 
understood. 


The  rules  are  embodied  as  a section  of  PDP-8  assembly  code  with  numerous  conditional 
branches  testing  the  symbol  being  translated  and  its  neighbors  [11].  Changing  the  rules 
would  presumably  involve  rewriting  part  of  the  assembly  code  and  reassembling. 


Bell  Telephone  Laboratories  System 

Another  system  for  translating  text  to  speech  by  letter-to-sound  rules  has  been  de- 
scribed by  Mcllroy  [121  at  Bell  Telephone  Laboratories.  Mcllroy’s  system  contains  more 
than  750  letter-to-sound  rules,  which  include  100  words,  580  word  fragments,  and  70 
letters  and  occupies  11,000  bytes  in  a PDP  11/45.  This  is  the  typical  approach  taken  by 
a letter-to-sound  rule  system.  The  system  has  a small  100-word  exception  dictionary,  with 
the  remainder  being  context  sensitive  translations  (the  580  word  fragments). 
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The  approach  taken  is  to  isolate  a word  from  the  input  text  and  attempt  to  find  it 
in  the  exception  dictionary.  If  the  word  is  not  found,  capital  letters  are  converted  to 
lower-case  letters  and  leading  and  trailing  punctuation  eliminated.  The  dictionary  is  then 
searched  for  the  converted  word.  If  it  still  is  not  found,  a final  s is  removed  and  final  ie 
is  changed  to  y when  appropriate.  The  altered  word  is  looked  up.  If  none  of  the  above 
procedures  succeeds  in  finding  the  word  in  the  dictionary,  letter-to-sound  rules  are  applied. 

Mcllroy’s  rules  specify  not  only  phonetic  output  but  alterations  to  be  made  in  the 
input  string.  For  instance,  his  qu  rule  outputs  a synthesizer  code  corresponding  to  the  /k/ 
phoneme  and  also  rewrites  the  input  string  so  that  w appears  instead  of  u.  This  additional 
complication  allows  his  war  rule  to  give  the  right  pronounciation  to  the  a,  not  only  in  war, 
but  in  quart. 

Mcllroy  reported  that  the  program  performed  satisfactorily  for  97%  of  the  2000  most 
common  words  listed  in  the  Brown  Corpus  [3]  and  performed  satisfactorily  for  88%  of  the 
tail  consisting  of  a 1%  sample  of  the  Corpus  remainder.  Mcllroy  does  not  report  the 
criterion  of  satisfactory  performance  used. 

The  750  rules  mentioned  are  contained  in  tables  in  the  program  and  are  fairly  easy 
to  modify.  A number  of  others  however  are  embedded  in  the  program  code.  These  include 
rules  for  marking  medial  and  final  silent  e,  common  suffixes,  certain  potential  long  vowels, 
and  voiced  s.  The  system  directly  generates  codes  for  a particular  synthesizer;  no  IPA 
transcription  is  produced. 


THE  NRL  SYSTEM 

As  was  discussed  in  the  Introduction,  the  NRL  system  is  designed  to  test  the  conjec- 
ture that  acceptable  intelligibility  can  be  obtained  with  a limited  set  of  letter-to-sound  rules. 
The  implementation  algorithm  is  simpler  than  either  Mcllroy’s  or  Ainsworth’s  in  that  it 
involves  fewer  ad  hoc  preprocessing  steps  before  the  application  of  the  rules.  Mcllroy’s 
final-s;  stripping  and  ie-to-y  conversion  are  absent,  as  is  his  lookup  in  an  exceptions  dictionary. 
Instead  of  an  exceptions  dictionary,  we  have  included,  for  each  word  needing  individual 
treatment,  a rule  giving  its  correct  pronunciation;  such  single- word  rules  make  up  about  a 
sixth  of  the  full  set.  Ainsworth’s  breath-group  segmentation  is  also  absent,  although  we 
include  some  rules  that  convert  punctuation  into  pauses  of  various  lengths.  The  NRL 
system,  like  Ainsworth’s,  but  unlike  Mcllroy’s,  does  no  rewriting  of  the  input  string  and 
produces  IPA  as  the  output  of  the  rules.  The  decision  to  use  IPA  was  due  to  our  desire 
not  to  be  tied  to  a particular  synthesizer;  the  text-to-phonetics  information  is  contained 
in  device-independent  rules,  and  only  the  more  direct  phonetics-to-synthesizer  rules  need 
to  be  changed  when  it  is  desired  to  change  to  a new  synthesizer. 

Because  we  required  a convenient  means  of  changing  the  rules  in  the  course  of  their 
development,  we  have  not  immediately  proceeded  to  a hand-coded  system  (like  Ainsworth’s) 
which  incorporates  the  rules  in  the  form  of  assembly  code.  Among  the  research  tools  we 
have  developed  is  a translation  program  in  SNOBOL,  to  be  described  more  fully,  which 
contains  the  rules  as  a text  string  easily  modifiable  even  by  someone  with  no  knowledge  of 
SNOBOL. 


5 


ELOVITZ,  JOHNSON,  McHUGH,  AND  SHORE 


Research  Tools  — Hardware 

Our  work  so  far  has  used  a commercial  speech  synthesizer,  a Federal  Screw  Works 
Votrax  VS-6  audio-response  unit.  It  can  produce  63  basic  speech  sounds  (called  “phonemes” 
by  the  manufacturer)  at  four  different  pitch  levels  (inflections)  and  string  them  together  to 
form  continuous  speech.  Although  the  Votrax  “phonemes”  do  not  correspond  exactly  to 
the  phonemes  of  English,  one  can  set  up  a fairly  straightforward  mapping  from  a phonemic 
transcription  to  Votrax  codes. 

We  used  the  synthesizer  with  a system  of  support  devices  that  provide  for  convenient 
input,  output,  and  manipulation  of  phonetic  texts.  The  speech-synthesis  laboratory  system 
(Fig.  1)  includes  a minicomputer  and  a collection  of  peripheral  devices.  Besides  the  speech 
synthesizer,  there  are  a phonetic  keyboard,  a terminal  with  twin  digital  magnetic-tape 
cassette  units,  a cathode-ray- tube  (CRT)  terminal,  a teletype  with  paper-tape  punch  and 
reader,  and  a modem  for  communication  with  NRL’s  PDP-10. 
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Fig.  1 — The  Naval  Research  Laboratory’s  speech  laboratory  system 


The  phonetic  keyboard,  made  by  Federal  Screw  Works  for  use  with  the  Votrax 
synthesizer,  has  a key  for  each  phoneme,  four  inflection  keys,  and  a few  control  keys. 

The  terminal  is  a Texas  Instruments  (Tl)  733  Silent  700  data  terminal,  used  for  typing 
commands  to  control  the  system,  for  entering  phonetic  texts  and  other  messages,  and  for 
printing  out  messages  and  error  reports.  The  cassette  units  record  messages  on  tape  and 
play  them  back.  The  teletype  is  a backup  for  the  Tl  733  terminal  and  permits  paper  tapes 
to  be  punched  and  read. 


Editing  is  the  function  of  the  CRT  terminal,  a Delta  Data  Systems  TelTerm  video- 
display terminal.  Messages  can  be  sent  to  the  screen  by  the  system  or  typed  there  directly 
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from  the  CRT  keyboard,  characters  can  be  added  or  deleted,  and  the  resulting  message  can 
be  sent  back  to  the  system  for  transmission  to  another  device.  For  example  a phonetic 
message  can  be  composed  on  the  screen,  edited,  and  spoken  out  by  the  Votrax;  it  can  then 
be  edited  further  and  spoken  out  again.  A permanent  copy  can  be  printed  on  the  TI  733 
or  teletype,  recorded  digitally  on  the  TI  733’s  cassette  unit,  or  recorded  on  an  audio  tape 
recorder. 

The  minicomputer  is  a TI  960A  computer  with  12,000  16-bit  words  of  memory.  It 
receives  messages  from  the  peripheral  devices,  transmits  messages  to  the  devices,  holds 
messages  in  buffers  in  its  memory,  and  translates  messages  to  formats  appropriate  to  the 
various  peripheral  devices.  The  messages  are  transferred  and  translated  in  response  to 
commands  that  are  usually  entered  from  the  TI  733  terminal  keyboard.  It  is  possible 
however  to  specify  another  peripheral  device  or  a memory  buffer  as  the  source  for  commands. 

The  modem  links  the  TI  960A  to  a remote  time-sharing  computer  when  computations 
are  needed  beyond  the  current  capabilities  of  the  TI  960A  software.  Among  these  compu- 
tations is  the  translation  of  English  text  to  phonetics,  which  is  handled  by  a SNOBOL  pro- 
gram running  on  NRL’s  PDP-10.  The  procedure  is  to  link  to  the  PDP-10  by  telephone, 
start  the  SNOBOL  program,  send  it  an  English-text  message  from  the  terminal,  and  record 
on  a cassette  the  phonetic  text  received  in  reply.  The  cassette  is  then  played  back  for 
editing,  speaking  out  through  the  Votrax,  and  the  like. 


Research  Tools  — Software 

TRANS,  the  translation  program  mentioned,  accepts  text,  applies  the  translation  rules, 
and  returns  the  translated  results.  Input  may  come  from  the  terminal  or  a text  file;  output 
may  be  sent  to  a file,  the  terminal  printer,  or  the  cassette  unit.  The  complete  translation 
from  English  to  Votrax  codes  may  be  requested,  or  the  English-to-IPA  or  IPA-to-Votrax 
pass  may  be  requested  separately.  TRANS  is  described  more  completely  in  Appendix  A. 

The  rules  are  kept  in  character  strings  in  a form  easy  for  human  beings  to  read  and 
write.  They  are  interpreted  by  the  program.  Each  rule  has  the  form 

A[B] C=D 

which  is  essentially  the  same  form  as  Ainsworth’s.  The  meaning  is  “The  character  string 
B,  occurring  with  left  context  A and  right  context  C,  gets  the  pronunciation  D.” 

D consists  of  IPA  symbols  — or  rather  a capitalized  latin-letter  representation  of  IPA 
to  cater  to  computer  character  sets  (Table  1).  B is  a letter  or  text  fragment  to  be  trans- 
lated. A and  C are  patterns;  like  B they  may  be  strings  of  letters  and  other  characters, 
but  some  special  symbols  denote  classes  of  strings  such  as  “voiced  consonant”  and  “vowel 
cluster.”  Table  2 lists  the  symbols  that  have  such  special  interpretations.  Blanks  are 
significant,  because  they  identify  the  beginnings  and  ends  of  words. 
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Table  1 

Latin-Letter  Representation  of  IPA 


Standard 

IPA 

Representation 

Example 

Standard 

IPA 

Representation 

Example 

i 

IY 

bfifit 

g 

G 

goat 

I 

IH 

bit 

f 

F 

fault 

e 

EY 

gate 

V 

V 

vault 

£ 

EH 

get 

e 

TH 

ether 

ae 

AE 

fat 

a 

DH 

either 

a 

AA 

father 

s 

S 

sue 

D 

AO 

lawn 

z 

Z 

zoo 

0 

OW 

lone 

f 

SH 

leash 

U 

UH 

full 

3 

ZH 

leisure 

u 

UW 

fool 

h 

HH 

how 

ER 

murder 

m 

M 

sum 

AX 

about 

n 

N 

sun 

AH 

but 

0 

NX 

sung 

al 

AY 

hide 

•1 

L 

laugh 

aU 

AW 

how 

w 

W 

wear 

01 

OY 

toy 

j 

Y 

young 

P 

P 

pack 

r 

R 

rate 

b 

B 

back 

t; 

CH 

char 

t 

T 

time 

d3 

JH 

jar 

d 

D 

dime 

hw 

WH 

where 

k 

K 

coat 
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Table  2 

Special  Symbols  Appearing  in  the  English-to-IPA  Translation  Rules 


Symbol 

Meaning 

# 

One  or  more  vowels* 

* 

One  or  more  consonantst 

• 

One  of  B,  D,  V,  G,  J,  L,  M,  N,  R,  W,  and  Z:  a voiced  consonant 

$ 

One  consonant  followed  by  an  E or  I 

% 

One  of  (ER,  E,  ES,  ED,  ING,  ELY):  a suffix 

& 

One  of  (S,  C,  G,  Z,  X,  J,  CH,  SH):  a sibilant 

@ 

One  of  (T,  S,  R,  D,  L,  Z,  N,  J,  TH,  CH,  SH):  a consonant  influencing  the 
sound  of  a following  long  u (cf.  rule  and  mule) 

A 

One  consonant 

+ 

One  of  (E,  I,  Y):  a front  vowel 

: 

Zero  or  more  consonants 

* Vowel*  are  A,  E,  I,  O,  U,  Y. 

tConionant*  are  B,  C,  D,  F,  G,  H,  J,  K,  L,  M,  N,  P,  Q,  R,  S,  T,  V,  W,  X,  Z. 


For  example,  a typical  rule  is 


‘ C[0]M=/AA/\ 

which  means  that  an  O after  an  initial  C and  before  an  M gets  the  pronunciation  /a/,  the 
a-sound  in  father.  Another  rule  is 


‘ :[E]  =/IY]  % 

where  the  colon  denotes  .my  sequence  of  zero  or  more  consonants,  which  means  that  final 
e,  if  the  only  vowel  in  a word,  gets  the  long-e  sound  /i/  of  be  and  she. 

The  translation  algorithm  scans  input  text  from  left  to  right  and,  for  each  character 
scanned,  sequentially  searches  the  rules  pertinent  to  that  character  until  it  finds  one  whose 
left-hand  side  matches  the  text  at  the  correct  position.  It  outputs  the  right-hand  side,  passes 
over  the  characters  bracketed  in  the  rule,  and  resumes  the  scan  with  the  next  character  of 
text.  The  input  string  is  never  altered. 

To  illustrate  the  operation  of  the  algorithm,  we  will  describe  a worked  example:  the 
translation  of  RATIO  using  the  English-to-IPA  rules  from  the  program  listing  of  TRANS  in 
Appendix  A. 
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To  the  left  of  the  first  character,  R,  the  program  adds  a blank  to  delimit  the  word, 
and  the  scan  starts  with  the  R,  as  we  indicate  with  a pointer:  t RATIO.  The  program 
searches  the  R inales  --  the  rules  with  R as  the  first  character  between  brackets.  The  first 
R rule,  ‘ [RE]  A#=/R  IY/’,  fails  to  match,  since  it  requires  that  R be  followed  by  E.  The 
next,  and  last,  R rule,  ‘[R]=/R/’,  is  the  default;  it  matches  any  R not  matched  by  earlier 
rules.  Consequently,  /R/  goes  into  the  output  string,  and  the  scan  moves  past  the  R to 
A:  R,ATIO. 

The  search  of  the  A rules  turns  up  no  match  before  ‘[A]A  + #=/EY/’,  which  applies 
when  A is  followed  by  a single  consonant,  a front  vowel  (E,  I,  or  Y),  and  another  vowel. 
The  program  adds  / EY/  to  the  output  and  moves  the  pointer  past  the  A to  T:  RAtTIO. 

The  first  T rule  that  matches  is  ‘ [TI]  O =/SH /’.  Consequently,  /SH/  goes  into  the 
output,  and  the  pointer  moves  past  TI  to  O:  RATI{  C.  The  program  does  not  search  the 
I rules,  since  the  I occurs  inside  the  brackets  with  the  T;  the  string  TI  as  a whole  gets  the 
pronunciation  /SH/  and  no  output  phonemes  correspond  to  I alone. 

The  first  match  among  the  0 rules  is  ‘[0]=/0W/’;  the  program  outputs  /OW/  and 
moves  the  pointer  past  the  O to  the  blank  at  the  end  of  the  word:  RATIO, . The  output 
string  is  /R/  /EY/  /SH/  /OW/,  which  represents  the  IP  A /re/o/,  the  correct  transcription 
[12] . If  the  translation  continued,  the  next  matching  rule  would  be  in  the  set  that  passes 
blanks,  commas,  periods,  and  other  punctuation  into  the  output  string  as  /( >/,  /(,>/,  /(.)/,  etc. 
The  program  would  output  /<  >/  and  move  the  pointer  past  the  blank  to  the  beginning  of 
the  next  word,  if  any. 

The  IPA  output  string  is  the  input  to  a second  pass  that  uses  the  same  algorithm  and 
rules  of  the  same  form  to  translate  IPA  to  Votrax  codes.  The  IPA-to-Votrax  rules  are 
fewer  and  more  straightforward  than  the  English-to-IPA  rules  (for  example,  ‘[T]  =[T]  ’). 

Since  the  synthesizer  automatically  varies  the  pronunciation  of  its  “phonemes”  to  suit  vari- 
ous contexts,  the  rules  need  not  contain  much  context  dependence.  Some  context- 
dependent  rules  have  been  included  however  to  implement  the  manufacturer’s  suggestions 
about  liquids,  particularly  L,  adjacent  to  certain  vowels.  The  complete  set  of  rules  is 
contained  in  the  program  listings  of  TRANS  in  Appendix  A. 

Another  program  DICT,  was  used  during  rule  development  to  insure  that  a rule  change 
proposed  to  fix  up  a dozen  mispronounced  words  would  not  ruin  a hundred  others  previ- 
ously translated  correctly.  DICT  accepts  a pattern  like  the  left-hand  side  of  a rule  but 
without  brackets;  it  gives  the  same  interpretations  as  TRANS  to  the  same  special  symbols. 
After  reading  the  pattern,  DICT  searches  a file  of  words  ar.<i  outputs  the  words  that  contain 
a match.  The  program  is  described  in  Appendix  B. 

DICT  must  read  the  entire  file  of  words  and  convert  to  SNOBOL  internal  representa- 
tion before  searching.  Although  we  have  a copy  of  the  frequency-ordered  list  of  words  in 
the  Brown  Corpus  [3]  on  line,  core-size  restrictions  have  limited  us  to  searching  a few 
thousand  words  at  a time.  DICT  was  complemented  by  the  on-line  text-editing  program 
SOS,  which  can  search  an  entire  text  file  for  patterns.  Pattern  searching  in  SOS  is  less 
convenient  than  in  DICT;  for  instance,  one  cannot  specify  “consonant”  as  an  element  of 
an  SOS  search  pattern.  However  with  SOS  we  could  search  the  entire  50,000-word  Brown 
Corpus  file. 
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The  Brown  Corpus  comprises  500  samples  of  English  text  written  in  a wide  variety  of 
styles.  Each  sample  is  roughly  2000  words  long,  and  the  entire  Corpus  totals  slightly  more 
than  a million  words.  The  file  we  use  lists  the  roughly  50,000  individual  words  occurring 
in  the  Corpus,  arranged  in  decreasing  order  of  frequency.  The  entry  for  each  word  contains 
some  items  of  numerical  information,  including  frequency  (the  number  of  occurrences  of 
the  word  in  the  Corpus)  and  number  of  texts  (the  number  of  text  samples,  among  the  500 
comprising  the  Corpus,  in  which  the  word  occurs). 

One  output  that  can  be  requested  from  TRANS  is  a stat  file  — a file  listing  every 
instance  of  every  rule  used  in  translating  every  word  in  a text  file.  A program  STAT  reads 
stat  files  and  produced  statistics  on  the  relative  importance  of  the  rules.  For  each  rule  STAT 
counts  the  words  in  whose  translation  the  rule  was  used,  sums  the  frequencies  of  those 
words,  and  suras  the  number  of  text  samples,  among  the  500  in  the  Corpus,  in  which  each 
of  those  words  appear.  The  output  comprises  these  three  absolute  results  together  with 
the  relative  results  obtained  by  normalizing  the  absolute  ones  so  that  their  sums  over  all 
rules  are  1. 

Pre-  and  postprocessors  were  written  to  enable  the  time-sharing  system  SORT  utility 
to  produce  from  a stat  file  a file  giving,  for  each  rule,  a list  of  all  the  words  in  whose  trans- 
lation the  rule  was  used.  This  provides  a detailed  analysis  of  the  interactions  of  a set  of 
rules.  A program  for  line-by-line  comparison  of  two  files  was  used  to  compare  translations 
of  a text  file  by  different  sets  of  rules.  In  scoring  the 'results  of  translating  a set  of  words, 
a program  was  used  that  accepts  a user’s  “good/bad”  judgments  on  translated  words  and 
accumulates  total  and  frequency- weighted  total  scores. 


Rule  Development 

Our  starting  point,  version  1 of  the  rules,  was  a modification  of  Ainsworth’s  set.  The 
main  alterations  were  changes  in  the  right-hand  sides  to  Americanize  the  accent  and  addi- 
tions to  handle  final  S,  ES,  and  ED  correctly.  Then  began  a development  cycle  with  the 
following  steps: 

1.  Translate.  With  version  1 we  translated  the  most  frequent  4000  words  in  the 
Brown  Corpus.  With  later  versions  we  included  samples  from  deeper  in  the  corpus. 

2.  Examine  results.  We  had  much  of  the  translated  output  spoken  by  the  synthesizer 
and  listened  to  it,  marking  mistakes  on  a printed  listing.  Kenyon  and  Knott’s  pronouncing 
dictionary  [13]  was  the  arbiter  in  case  of  doubt  or  disagreement  as  to  what  constituted  a 
mistake.  (The  authors’  linguistic  backgrounds  are  diverse  enough  that  disagreements  were 
fairly  frequent).  Later  in  the  project  we  grew  proficient  enough  at  reading  the  machine 
representation  of  IPA  to  risk  checking  some  samples  visually,  but  we  never  abandoned  the 
practice  of  listening  to  at  least  part  of  the  output  from  each  version  of  the  rules.  The 
major  goal  was  a good  IPA  transcription.  In  the  few  cases  where  a correct  transcription 
still  sounded  strange,  the-IPA-to-Votrax  rules  were  fixed  up  when  possible,  and  the  problem 
was  otherwise  blamed  on  the  synthesizer. 

3.  Classify  errors.  We  divided  the  mispronounced  words  into  lists  with  headings  like 
“TH  problem,”  “Silent  E problem,”  “Long  A problem,”  and  “Stress  problems.”  Then  we 
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scanned  the  lists  to  identify  specific  letter  patterns  being  frequently  mistranslated. 

4.  Modify.  For  a given  frequently  mistranslated  letter  pattern,  we  would  find  all 
sufficiently  frequent  words,  mistranslated  or  not,  that  matched  the  pattern.  If  the  correct 
pronunciations  agreed  in  a majority  of  cases,  or  in  even  a clear  plurality  of  cases,  we  wrote 
a new  or  altered  rule  tc  give  that  pronunciation;  otherwise  we  tried  a more  specific  context. 
For  example,  version  1 had  no  rule  for  the  EA  combination,  which  has  a great  variety  of 
pronunciations:  great,  heart,  ready,  sea,  earth.  Most  words  containing  EA  showed  up  on 
the  “EA  problem”  list.  We  found  the  long-e  pronunciation  /i/  in  roughly  half  of  them.  The 
addition  of  a rule  ‘[EA]  =*/IY/’  was  justified,  since  it  improved  many  words  and  did  not 
harm  the  rest.  Meat  received  the  correct  pronunciation  /mit/,  and  great  was  no  worse  as 
/grit/  than  it  had  been  as  /gre  set/.  During  the  second  round  of  development  many  EA 
words  still  showed  up  as  problems,  but  a search  with  DICT  turned  up  the  large  number  now 
getting  the  correct  pronunciation.  Looking  for  a more  specific  pattern,  we  found  lots  of 
EAD  words  on  the  problem  list.  A search  of  the  Corpus  for  EAD  words  suggested  adding 

a rule  ‘[EA]  D=/EH/’,  which  fixes  ready,  changes  one  acceptable  pronunciation  of  lead  to 
another,  and  hurts  a few  previously  correct  words  like  bead.  The  additions  and  alterations 
continued  until  the  accumulation  of  changes  made  the  interactions  between  rules  hard  to 
keep  track  of. 

5.  Iterate.  Having  produced  a new  version,  we  would  start  the  cycle  over  by  trans- 
lating several  thousand  words.  We  went  through  the  cycle  twice,  ending  with  version  3. 
Before  testing  version  3 we  pruned  the  rules  by  looking  at  the  STAT  outputs  for  version 
2 and  removing  rules  that  were  rarely  used.  Hence  the  rules  for  initial  PT  and  initial  X, 
although  quite  reliable,  were  thrown  out  for  small  importance. 


Testing 

We  tested  version  3 by  translating  the  8000  most  frequent  words  plus  a 1000-word 
sample  selected  from  the  tail  of  the  corpus  — words  with  frequencies  of  1 or  2 per  million. 
The  first  5000  words  and  the  tail  sample  were  scored  like  the  translations  by  earlier  versions: 
the  criterion  for  correctness  was  a good  IPA  transcription,  and,  although  we  did  not  look 
up  most  words  in  a pronouncing  dictionary,  Kenyon  and  Knott  [13]  was  the  arbiter  when 
questions  arose.  Numbers,  symbols,  and  abbreviations  were  excluded  from  the  scoring. 

Any  transcription  accepted  by  Kenyon  and  Knott  was  allowed,  not  just  the  preferred.  Some 
deviations  were  allowed.  The  horse:hoarse  distinction  (/or/  vs  /or/)  was  ignored,  as  were 
the  Mary  :merry :marry  distinction  and  similar  distinctions  involving  vowels  followed  by 
R.  Doubled  consonants  (/bltte/  instead  of  /bite/  for  bitter)  were  not  counted  as  errors. 
Otherwise  we  tried  to  be  quite  strict  in  scoring  consonants  and  stressed  vowels.  Sometimes 
an  unstressed  vowel  translated  with  the  full  or  stressed  pronunciation  was  classed  as  a 
“stress  problem”  rather  than  a mistake,  if  vowel  reduction  upon  stressing  would  give  a good 
transcription.  Thus  /aebaUt / instead  of  /abaUt/  for  about,  though  marked  as  a stress 
problem,  was  not  scored  as  an  error.  Some  subjectivity  entered  here.  Stress  problems 
judged  less  severe  than  that  in  about  were  sometimes  not  marked  at  all;  more  severe  ones 
were  sometimes  scored  as  errors. 


NRL  REPORT  7948 


RESULTS 

Table  3 gives  the  result  of  scoring  IPA  transcriptions  of  1000- word  samples  from  the 
Brown  Corpus.  The  first  three  columns  are  based  on  a count  of  the  number  of  distinct 
words  correctly  translated  and  the  total  number  translated.  The  last  three  columns  are 
based  on  the  sums  of  the  frequencies  of  the  correctly  translated  words  and  of  all  the  trans- 
lated words.  The  frequencies  were  obtained  from  the  Corpus;  they  give  the  number  of 
times  the  word  appeared  and  thus  represent  roughly  parts  per  million.  The  first  rows  are 
based  on  successive  1000-word  samples,  starting  from  the  beginning  of  the  Corpus;  the  last 
is  based  on  1000  words  selected  from  the  tail  of  the  Corpus  (1/18  of  the  words  with  2 
occurrences  per  million  and  1/36  of  those  with  1 per  million). 


Table  3 

Scores  and  Frequency- Weigh  ted  Scores  for  1000- Word  Samples  from  the 
Brown  Corpus  translated  by  Version  3 of  the  Rules 


Sample 

No.  of 
Words 
Scored 

No.  of 
Words 
Correct 

Percent 

Correct 

Total 
Frequency 
of  Words 
Scored 

Total 

Frequency 
of  Correct 
Words 

Percent 

Correct 

(Frequency 

Weighted) 

1 

976 

847 

86.8 

691,375' 

664,564 

96.1 

2 

974 

808 

83.0 

72,966 

60,862 

83.4 

3 

973 

744 

76.5 

43,664 

33,401 

76.5 

4 

988 

757 

76.6 

30,391 

23,315 

76.6 

5 

971 

707 

72.8 

21,601 

15,743 

72.9 

Tail 

922 

599 

65.0 

~ 1.295 

849 

65.6 

Table  4 gives  similar,  cumulative  results  based  on  the  first  1000,  first  2000,  first  3000, 
etc.  words  of  the  Corpus;  the  last  line  is  an  estimate,  derived  from  the  foregoing,  of  the  results 
that  would  have  been  obtained  had  the  entire  Corpus  been  translated  and  scored.  The  upper 
bounds  were  computed  under  the  assumption  that  the  error  rate  observed  in  the  fifth  1000- 
word  sample  (Table  3)  held  constant  up  to  the  beginning  of  the  tail  sample;  the  lower  bounds 
assume  that  the  error  rate  following  the  first  5000  words  is  equal  to  that  observed  in  the  tail. 
The  figures  89%  to  90%  in  the  last  column  mean  that,  assuming  the  Corpus  frequencies  are 
representative,  we  would  expect  to  correctly  translate  89%  to  90%  of  the  words  in  a random 
sample  of  English  text. 

Table  5 gives  results  for  the  first  1000  words  as  translated  at  various  stages  of  rule 
development. 
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Table  4 

Cumulative  Scores  and  Frequency- Weighted  Scores  for  the  First  n Thousand  Words 
of  the  Brown  Corpus  Translated  by  Version  3 of  the  Rules 


n 

No.  of 

Words 

Scored 

No.  of 
Words 
Correct 

Percent 

Correct 

Total 
Frequency 
of  Words 
Scored 

Total 
Frequency 
of  Correct 
Words 

Percent 

Correct 

(Frequency 

Weighted) 

1 

976 

847 

86.8 

691,375 

664,564 

96.1 

2 

1950 

1655 

84.9 

764,341 

725,426 

94.9 

3 

2923 

2399 

82.1 

808,005 

758,827 

93.9 

4 

3911 

3156 

80.7 

838,396 

782,142 

93.3 

5 

4882 

3863 

79.1 

859,997 

797,885 

92.8 

Entire 

Corpus 

(est.) 

66  to 
69 

89  to 

90 

Table  5 

Scores  and  Frequency-Weighted  Scores  for  the  First  1000  Words  of  the  Brown  Corpus 
Translated  by  Various  Versions  of  the  Rules 


Version 

No.  of 
Rules* 

No.  of 
Words 
Scored 

No.  of 
Words 
Correct 

Percent 

Correct 

Total 
Frequency 
of  Words 
Scored 

Total 
Frequency 
of  Correct 
Words 

Percent 

Correct 

(Frequency 

Weighted) 

1 

182 

Ea 

428 

43.9 

691,375 

470,575 

2 

264 

1 Q 

688 

70.4 

691,497 

606,287 

u M i ^ 

3 

319 

976 

847 

86.8 

691,375 

664,564 

96.1 

♦These  counts  exclude  rules  for  the  ten  digits  and  for  all  punctuation  symbols  except . , - ’ ? 
and  blank. 


Table  6 gives  version  3 of  the  English-to-IPA  rules  together  with  the  statistics  computed 
by  STAT  for  the  first  8000  words  of  the  Corpus.  The  first  column  gives  the  number  of 
distinct  words  that  matched  each  rule.  Column  2 is  column  1 normalized  to  a total  of  1. 
Column  3 gives  the  sum  of  the  frequencies  of  the  words  matching  each  rule,  and  column 
4 is  column  3 normalized.  Column  5 sums  the  number  of  texts  in  which  the  words 
occurred,  and  column  6 is  column  5 normalized.  If  a rule  was  used  more  than  once  in 
translating  a word,  that  word  contributed  more  than  once  to  the  word  count,  frequency 
sum,  and  number-of-texts  sura  for  the  given  rule.  Table  7 is  based  on  the  1000-word 
sample  selected  from  the  tail  of  the  Corpus.  Table  8 gives  the  rules  for  translation  from 
IPA  to  Votrax  codes,  together  with  STAT  results  based  on  the  first  8000  words  as  trans- 
lated by  version  3 of  the  rules. 
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Table  6 

STAT  Results  for  the  First  8000  Words  of  the  Brown  Corpus 
Translated  by  Version  3 of  the  Rules 


No.  of 

Total  Frequencies 

Total  No.  of  Texts 

Rule 

Words  Matched 

of  Words  Matched 

for  Words  Matched 

Abs. 

Relative 

Abs. 

Relative 

Abs. 

Relative 

***  A RULE  *** 

IA]  -/AX/ 

94 

0.0021493 

26051 

0.0090661 

1697 

0.0012855 

CARE]  -/A A R/ 

1 

0.0000229 

4393 

0.0015288 

453 

0.0003431 

t ARJ0-/AX  R/ 

3 

0.0000686 

599 

0.0002085 

288 

0.0002182 

(AR]#»/EH  W 

151 

0.0034526 

6320 

0.0021994 

4020 

0.0030451 

■*IAS]#-/EY  S/ 

18 

0.0004116 

1334 

0.0004642 

766 

0.0005802 

I A3WA-/AX/ 

10 

0.0002286 

728 

0.0002534 

417 

0.0003159 

[AW] -/AO/ 

23 

0.0005259 

1256 

0.0004371 

719 

0. 0005440 

• ( ANY]»/EH  N IY/ 

9 

0.0002058 

2954 

0.0010280 

1201 

0.0009097 

(A]*+#-/EY/ 

221 

0.0050532 

8369 

0.0029125 

4588 

0.0034754 

#*tALLY)«/AX  L IY/ 

46 

0.0010518 

1920 

0.0006682 

1556 

0.0011787 

t AL)#-/AX  L/ 

17 

0.0003887 

898 

0.0003125 

578 

0.0004378 

[AGAIN I-/AX  G EH  N/ 

2 

0.0000457 

1204 

0.0004190 

555 

0.0004204 

#» [ AG1E-/IH  JH/ 

49 

0.0011204 

1799 

0.0006261 

1087 

0.0008234 

IAr+«#-/AE/ 

193 

0.0044129 

7458 

0.0025955 

4608 

0.0034905 

«CAr+  -/EY/ 

89 

0.0020350 

9944 

0.0034606 

4838 

0.0036647 

[Ar*-/EY/ 

232 

0.0053047 

8750  0.0030451 

5778 

0.0043768 

CARRJ-/AX  R/ 

13 

0.0002972 

329 

0.0001145 

276 

0*0002091 

I ARRJ-/AE  W 

22 

0.0005030 

841 

0.0002927 

544 

0.0004121 

•CAR]  «/AA  R/ 

7 

0.0001601 

849- 

0.0002955 

408 

0.0003091 

CAR]  »/ER/ 

24 

0.0005488 

986 

0.000343? 

666 

0.0005045 

[ARI-/AA  R/ 

211 

0.0048245 

10137 

0.0035278 

5952 

0.0045086 

CAIR1-/EH  R/ 

27 

0.00061 74 

1244 

0.0004329 

767 

0.0005810 

IAII-/EY/ 

163 

0.0037270 

6774 

0.0023574 

4490 

0.0034011 

(AYJ-/EY/ 

97 

0.0022179 

8739 

0.0030413 

4305 

0.0032610 

IAUI-/A0/ 

59 

0.0013490 

2743 

0.0009546 

1512 

0.0011453 

#«  CAL]  -/AX  L/ 

201 

0.0045959 

11422 

0.0039750 

6166 

0.0046707 

#«CALSJ  -/AX  L Z/ 

12 

0.0002744 

484 

0.0001684 

285 

0.0002159 

IALKJ-/A0  K/ 

10 

0.0002286 

694 

0.0002415 

483 

0.0003659 

[ALJ**/A0  L/ 

109 

0.0024923 

10348 

0.0036012 

4535 

0.0034352 

« IABLEJ-/EY  B AX  L/ 

4 

0.0000915 

488 

0.0001698 

319 

0.0002416 

[ABLE]-/ AX  B AX  U 

45 

0.0010289 

1342 

0.0004670 

1005 

0.0007613 

CANG3+-/EY  N JH/ 

29 

0.0006631 

1495 

0.0005203 

985 

0.0007461 

(AJ-/AE/ 

1482 

0.0338859 

1 18519 

0.0412462 

39864 

0.0301967 

3673  0.0839831  261411  0.0909745  105711  0.0800752 
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Table  6 (continued) 

STAT  Results  for  the  First  8000  Words  of  the  Brown  Corpus 


Translated  by  Version  3 of  the  Rules 


No.  of 

Total  Frequencies 

Total  No.  of  Texts 

Rule 

Words  Matched 

of  Words  Matched 

for  Words  Matched 

| Abs. 

Relative 

Abs. 

Relative 

Abs. 

Relative 

**★  BRULE  *** 

CBEr#-/B  IH/ 

35 

0.0008003 

4727 

0.0016451 

2616  0.0019816 

CBEING1-/B  IY  IH  NX/ 

2 

0.0000457 

748 

0.0002603 

361 

0.0002735 

(BOTH]  »/B  ON  TH/ 

I 

0.0000229 

730 

0.0002540 

337  0.0002553 

IBUS]#«/B  IH  U 

4 

0.0000915 

484 

0.0001684 

237 

' 0.0001795 

IBUILWB  IH  L/ 

6 

0.0001372 

481 

0.0001674 

291 

0.0002204 

IB]»/B/ 

1/9 

0.0166686 

50010 

0.0174042 

i996ti 

> 0.0 i 5 i 24 1 

777 

0.0177661 

57180 

0.0198994 

23808  0.0180344 

***  CRULE  *** 

ICHr»/K/ 

9 

0.0002058 

392 

0.0001364 

138 

l 0.0001045 

''EtCHWK/ 

to 

0.0002286 

451 

0. COOI 570 

266  0.0002015 

ICH1-/CH/ 

215 

0.0049160 

16131 

0.0056138 

6955 

i 0.0052684 

SICI1#«/S  AY/ 

5 

0.0001143 

305 

0.0001061 

151 

0.0001144 

ICI1A-/SH/ 

35 

0.0008003 

1763 

0.0006135 

1077 

0.0008158 

ICIJQ-/SH/ 

10 

0.0002286 

230 

0.0000800 

173  0.0001310 

ICIJEN-/SH/ 

1 

0.0001601 

307 

0.0001068 

224 

0.0001697 

tciws/ 

475 

0.0108609 

23550 

0.0081957 

14371 

0.0108859 

CCKI-/K/ 

98 

0.0022408 

4217 

0.0014676 

2415  0.0018293 

IC0M1WK  AH  M/ 

13 

0.0002972 

1706 

0.0005937 

1017 

’ 0.0007704 

ICJ-/K/ 

1482 

0.0338859 

65195 

0.0226887 

38499 

' 0.0291627 

2359 

0.0539385 

114247 

0.0397595 

65286 

i 0.0494536 

***  DRULE  *** 

#« IDED]  m/D  IH  D/ 

51 

0.001 1661 

1927 

0.0006706 

1540 

i 0.0011665 

• EC D } m/D/ 

312 

0.0071 339 

12985 

0.0045190 

9639 

’ 0.0073015 

fTiElD]  m/1/ 

140 

0.0032011 

6040 

0.0021020 

4502 

: 0.0034102 

IDEr#-/D  IH/ 

124 

0.0028353 

4867 

0.0016938 

3158 

l 0.0023922 

IDO]  m/D  UN/ 

1 

0.0000229 

1363 

0.0004743 

396 

i 0.0003000 

IDOESJ-/D  AH  Z/ 

2 

0.0000457 

572 

0.0001991 

318  0.0002409 

ID0ING1-/D  UN  IH  NX/ 

1 

0.0000229 

163 

0.0000567 

124 

0.0000939 

( DON] s/0  AN/ 

4 

0.0000915 

964 

0.0003355 

340 

i 0.0002575 

I DU ] A»/JH  UN/ 

12 

0.0002744 

503 

0.0001750 

330  0.0002500 

ID1-/D/ 

1301 

0.0297473 

1 02440 

0.0356505 

40797  0.0309034 

1948  0.0445410  131824  0.0458765  61144  0.0463161  • 


.V -V \y\\  s-Jrlr*  v 
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Table  6 (continued) 

ST  AT  Results  for  the  First  8000  Words  of  the  Brown  Corpus 
Translated  by  Version  3 of  the  Rules 


No.  of 

Total  Frequencies 

Total  No.  of  Texts 

Rule 

Words 

Matched 

of  Words  Matched 

for  Words  Matched 

Ato. 

Relative 

Abs. 

Relative 

Abs. 

Relative 

**★  ERULE  **★ 

#•  CE]  =/  / 

1006 

0.0230022 

73857 

0.0257032 

38891 

0.0294596 

* Ai  CE]  «/  / 

7 

0.0001601 

519 

0.0001806 

315 

0.0002386 

t CE]  »/IY/ 

19 

0.0004344 

23483 

0.0081724 

2257  0.0017097 

#CED]  */D/ 

14 

0.0003201 

641 

0.0002231 

495 

0.0003750 

#» CHID  */  / 

446 

0.0101978 

18502 

0.0064389 

13820 

0.0104685 

CEV]ER-/EH  V/ 

20 

0.0004573 

3258 

0.0011338 

1915 

0.0014506 

CErwiY/ 

106 

0.0024237 

4302 

0.0014972 

3007 

0.0022778 

CERI]#*/IY  R IY/ 

21 

0.0004802 

1508 

0.0005248 

873 

0.0006613 

CERIJ-/EH  R IH/ 

24 

0.0005488 

1423 

0.0004952 

724 

0.0005484 

#»CER]#*/ER/ 

115 

0.0026295 

6410 

0.0022308 

3657 

0.0027701 

CER]#«/EH  R/ 

17 

0.0003887 

1110 

0.0003863 

532 

0.0004030 

CERWER/ 

62  2 

0.0142220 

33594 

0.01 1 6912 

18042 

0.0136667 

CEVENWIY  V EH  N/ 

7 

0.0001601 

1564 

0.0005443 

685 

0.0005189 

#»CE]N-/  / 

10 

0.0002286 

173 

0. 0000602 

127 

0.0000962 

*£EW]»/UW/ 

20 

0.0004573 

2819  0.0009810 

1019 

0.0007719 

CEWWY  UW/ 

1 

0.0000229 

601 

0.0002092 

311 

0.0002356 

IE10-/IY/ 

27 

0.0006174 

792 

0.0002756 

426 

0.0003227 

#«4CES]  »/IH  2/ 

116 

0.0026523 

4265 

0.0014843 

2599 

0.0019687 

#«  CE]S  -/  / 

264 

0.0060364 

11065 

0.0038508 

6612 

0.0050085 

#*CELY]  */L  IY/ 

45 

0.0010289 

1834 

0.0006383 

1461 

0.0011067 

#«CEMENT]»/M  EH  N T/ 

37 

0.0008460 

1437  0.0005001 

854 

0.0006469 

CEFULJ-/F  UH  L / 

7 

0.0001601 

281 

0.0000978 

233 

0.0001765 

CEE]»/IY/ 

168 

0.0038413 

13544 

0.0047135 

6845 

0.0051850 

(EARN  WE R N/ 

8 

0.0001829 

345 

0.0001201 

263 

0.0002030 

CEAR1WER/ 

7 

0.0001601 

751 

0.0002614 

447 

0.0003386 

CEADJ-/EH  D/ 

29 

0.0006631 

2297 

0.0007994 

1517 

0.0011491 

#«CEA]  «/IY  AX/ 

3 

0. 0000686 

530 

0.0001844 

278 

0.0002106 

CEA1SU-/EH/ 

9 

0.0002058 

440 

0.0001531 

260 

0.0001969 

CEA]»/IY/ 

302 

0.0069052 

17378 

0.0060478 

10646 

0.0080643 

CEIGH]=»/EY/ 

16 

0.0003658 

534 

0.0001858 

358 

0.0002712 

CEI ]*/IY/ 

31 

0.0007088 

1349 

0.0004695 

849 

0.0006431 

t EYE]-/ AY/ 

3 

0.0000686 

533 

0.0001855 

238 

0.0001803 

CEYJ-/IY/ 

30 

0.0006859 

1169 

0.0004068 

604 

0.0004575 

CEU]«/Y  UW/ 

11 

0.0002515 

364 

0.0001267 

170 

0.0001288 

CE]*/EH/ 

2065 

0.0472162 

95200 

0.0331309 

57157 

0.0432960 

56330.1287984  3278720.1141039  1784920.1352063 


iv*;; 
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Table  6 (continued) 

ST  AT  Results  for  the  First  8000  Words  of  the  Brown  Corpus 
Translated  by  Version  3 of  the  Rules 


No.  of 

Total  Frequencies 

Total  No.  of  Texts 

Rule 

Words  Matched 

of  Words  Matched 

for  Words  Matched 

Abs. 

Relative 

Abs. 

Relative 

Abs. 

Relative 

***  F RULE  *** 

[FULJ-/F  UH  L/ 

29 

0.0006631 

1043 

0.0003630 

817 

0.0006189 

tFWF/ 

736 

0.0168286 

58778 

0.0204555 

26736 

0.0202523 

765 

0.0174917 

59821 

0.0208185 

27553 

0.0208712 

***  grule  *** 

[GIVJ-/G  IH  V/ 

6 

0.0001372 

1015 

0.0003532 

557 

0.0004977 

[Gir=/G/ 

8 

0.0001829 

475 

0.0001653 

225 

0.0001704 

tGElT-/G  EH/ 

12 

0.0002744 

1504 

0.0005234 

788 

0.0005969 

SUfGGESWG  JH  EH  S/ 

6 

0.0001372 

258 

0.0000898 

223 

0.0001689 

[GGJ*/G/ 

20 

0.0004573 

399 

0.0001389 

287 

0.0002174 

B#tG]*/G/ 

10 

0.0002286 

1102 

0.0003835 

694 

0.0005257 

CGJ+-/JH/ 

176 

0.0040242 

7355 

0.0025596 

4170 

0.0031587 

[GREATWG  R EY  T/ 

5 

0.0001143 

1014 

0.0003529 

546 

0.0004136 

#(GH]«/  / 

II 

0.0002515 

522 

0.0001817 

366 

0.0002772 

CG1-/G/ 

347 

0.0079341 

15701 

0.0054642 

8692 

0.0065841 

601 

0.0137419 

29345 

0.0102125 

16648 

0.0126107 

***  HRULE  *** 


IHAV1-/HH  AE  V/ 

5 

0.0001143 

4284 

0.0014909 

730 

0.0005530 

fHEREWHH  IY  R/ 

2 

0.0000457 

. 761 

0.0002648 

325 

0.0002462 

IHOURWAW  ER/ 

2 

0.0000457 

319 

0.00011 10 

209 

0.0001583 

[HOKWHH  AW/ 

8 

0.0001829 

1583 

0.0005509 

744 

0.0005636 

tHJ#«/HH/ 

296 

0.0067680 

457 1 1 

0.0159080 

11372 

0.0086142 

t H 3=/  / 

21 

0.0004802 

976 

0.0003397 

360 

0.0002727 

334 

0.0076369 

53634 

0.0186654 

13740 

0.0104079 

18 
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Table  6 (continued) 

STAT  Results  for  the  First  8000  Words  of  the  Brown  Corpus 
Translated  by  Version  3 of  the  Rules 


No.  of 

Total  Frequencies 

Total  No.  of  Texts 

Rule 

Words  Matched 

of  Words  Matched 

for  Words  Matched 

Abs.  Relative 

Abs. 

Relative 

Abs. 

Relative 

***  I RULE  *** 


CINJ-/IH  N/ 

202 

0.0046187 

31259 

0.0108786 

6176 

0.0046783 

m */AY/ 

6 

0.0001372 

5894 

0.0020512 

692 

0.0005242 

IINJD-/AY  N/ 

22 

0.0005030 

2022 

0.0007037 

1253 

0.0009491 

CIERWIY  ER/ 

II 

0.0002515 

419 

0.0C01 458 

282 

0.0002136 

#«RCIEDJ  »/IY  0/ 

6 

0.0001372 

348 

0.0001211 

250 

0.0001894 

[I ED]  »/AY  D/ 

24 

0.0005488 

1009 

0.0003511 

769 

0.0005825 

CIENWIY  EH  N/ 

17 

0.0003887 

700 

0.0002436 

416 

0.0003151 

[ IE'3T*/AY  EH/ 

13 

0.0002972 

779 

0.0002711 

403 

0.0003053 

«ni%*/AY/ 

10 

0.0002286 

277 

0.0000964 

215 

0.0001629 

[I1%»/IY/ 

88 

0.0020121 

2808 

0.0009772 

1520 

0.0011514 

[IEWIY/ 

36 

0.0008231 

181  1 

0.0006303 

1139 

0.0008628 

[ir+«#*/IH/ 

384 

0.0087802 

15196 

0.0052884 

9640 

0.0073022 

CIR]#-/AY  W 

51 

0.0011661 

2006 

0.0006981 

1378 

0.0010438 

UZ1WAY  Z/ 

19 

0.0004344 

697 

0.0002426 

521 

0.0003947 

[ IS]%*/AY  TJ 

32 

0.0007317 

1027 

0.0003574 

799 

0.0006052 

UJDWAY/ 

40 

0.0009146 

2544 

0.0008853 

1710 

0.0012953 

+'iirwiH/ 

74 

0.0016920 

2855 

0.0009936 

1737 

0.0013158 

CI1TX-/AY/ 

24 

0.0005488 

2043 

0.0007110 

1119 

0.0008476 

#-s[I]-WIH/ 

232 

0.0053047 

9645 

0.0033566 

5899 

0.0044684 

tir+«/AY/ 

116 

0.0026523 

10713 

0.0037283 

5221 

0.0039549 

CIRWER/ 

42 

0.0009603 

3221 

0.0011210 

1603 

0.0012143 

UGH  WAY/ 

55 

0.0012576 

4271 

0.0014864 

2451 

0.00185 66 

IILOWAY  L 0/ 

11 

0.0002515 

810 

0.0002819 

382 

0.0002894 

flGN]  */AY  N/ 

3 

0.0000686 

226 

0.0000787 

1 16 

0.0000879 

C IGNl^a/AY  W/ 

4 

0.0000915 

176 

0.0000612 

89 

0.0000674 

[ IGN]%=*/AY  N/ 

4 

0.000091 5 

216 

0.0000752 

147 

0.0001114 

C IQUE ]*/IY  K/ 

4 

0.0000915 

229 

0.0000797 

147 

0.0001114 

2038 

0.0465988 

1 28923 

0.0448669 

55411 

0.0419734 

3568 

0.0815823 

232124 

0.0807823 

101485 

0.0768741 

***  JRULE 

CJ]=/JH/ 

125 

0.0028581 

6066 

0.0021 110 

3099 

0.0023475 

125 

0.0028581 

6066 

0.0021110 

3099 

0.0023475 

***  KRULE 

•kirk 

t K ] N=/  / 

13 

0.0002972 

1847 

0.0006428 

961 

0.0007279 

tK)«/K/ 

224 

0.0051218 

13401 

0.0046637 

7403 

0.0056077 

237 

0.0054190 

15248 

0.0053065 

8364 

0.0063357 

£ 
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Table  6 (continued) 

ST  AT  Results  for  the  First  8000  Words  of  the  Brown  Corpus 
Translated  by  Version  3 of  the  Rules 


No.  of 

Total  Frequencies 

Total  No.  of  Texts 

Rule 

Words  Matched 

of  Words  Matched 

for  Words  Matched 

Abs.  | Relative 

Abs.  j Relative 

Abs. 

Relative 

***  LHULE  *** 

[L0JC#*/L  OH/ 

9 0.0002058 

514  0.0001789 

265 

0.0002007 

LtLW  / 

236  0.0053961 

17526  0.0060993 

8048 

0.0060963 

#“i!LJWAX  1/ 

! 08  0.0024694 

6084  0.0021173 

3428 

0.0025967 

ILEADWL  IY  0/ 

7 0.0001601 

515  0.0001792 

343 

0.0002598 

CLJ-/L/ 

1755  0.0401280 

85646  0.0298060 

49966 

0.0378488 

2115  0.0483594 

110285  0.0383807 

62050 

0.0470024 

***  MRULE  *** 

[MGVJ-/M  UW  V/ 

12  0.0002744 

930  0.0003237 

630 

0.0004772 

1370  0.0313250 

88465  0.0307870 

42262 

0.0320131 

1382  0.0315994 

89395  0.0311107 

42892 

0.0324903 

Hr** 


***  NRULE 

EINGJ.+-/N  JH/ 
tNG]  R-/NX  G/ 
[NG]#-/NX  G/ 
CNGL1X-/NX  G AX  L/ 
(NG1-/NX/ 

(NK1-/NX  K/ 

[NOW]  «/N  AW/ 
CN1-/N/ 


9 0.0002058 
9 0.0002058 
30  0.0006859 
4 0.0000915 
526  0.0120270 
38  0.0008689 
I 0.0000229 
2446  0.0559277 


3063  0.0700354 


270  0.0000940 
353  0.0001228 
1036  0.0003605 
254  0.0000884 
23241  0.0080882 
1577  0.0005488 
1314  0.0004573 
170584  0.0593655 

198629  0.0691256 


144  0.0001091 
164  0.0001242 
704  0.0005333 
144  0.0001091 
15847  0.0120040 
961  0.0007279 
394  0.0002985 
73610  0.0557590 

91968  0.0696650 
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Table  6 (continued) 

STAT  Results  for  the  First  8000  Words  of  the  Brown  Corpus 
Translated  by  Version  3 of  the  Rules 


No.  of 

Total  Frequencies 

Total  No.  of  Texts 

Rule 

Words  Matched 

of  Words  Matched 

for  Words  Matched 

Abs. 

Relative 

Abs. 

Relative 

Abs. 

Relative 

■kirk  OfiULE  *** 


tOF]  »/AX  V/ 

2 

0.0000457 

36427 

0.0126771 

509 

0.0003856 

(ORDUCHWER  OW/ 

2 

0.0000457 

61 

0.0000212 

57 

0.0000432 

#< COR]  -/Ely 

69 

0.0015777 

2711 

0.0009435 

1506 

0.0011408 

#i (URS]  «/ER  Z/ 

22 

0.0005030 

62  4 

0.000217 2 

355 

0.0002689 

[OR]-/ AO  R/ 

360 

0.0082314 

32460 

0.0112965 

11195 

0.0084801 

[0NE1-/W  AH  N/ 

4 

0.0000915 

3487 

0.0012135 

637 

0.0004825 

[0W1-/0W/ 

112 

O.C  '25609 

7450 

0.0025927 

4514 

0.0034193 

[0VERJ-/0W  V ER/ 

9 

0.0002058 

1398 

0.0004865 

549 

0.0004159 

[0VJ-/AH  V/ 

/a 

0.0016005 

3713 

0.0012922 

2170 

0.0016438 

(orx»/ow/ 

134 

0.0030639 

7003 

0.0024371 

4611 

0.0034928 

[0]'*EN-/0W/ 

32 

0.0007317 

1849 

0.0006435 

1217 

0.0009219 

(0]**I#-/CIW/ 

40 

0.0009146 

1728 

0.0006014 

842 

0.0006378 

(QLJD-/0N  U 

27 

0.0006174 

2161 

0.0007521 

1118 

0.0008469 

[OUGHT] -/AO  T/ 

9 

0.0002058 

1072 

0.0003/31 

665 

0.0005037 

(0UGH]-/AH  F/ 

5 

0.0001143 

544 

0.0001893 

351 

0.0002659 

(0U1-/AH/ 

15 

0.0003430 

3895 

0.0013555 

1104 

0.0008363 

H(0U]S#-/AW/ 

8 

0.0001829 

932 

0.0003243 

425 

0.0003219 

[0US1-/AX  5/ 

56 

0.0012804 

2031 

0.0007068 

1492 

0.0011302 

[OUR] -/AO  R/ 

28 

0.0006402 

1955 

0.0006804 

1139 

0.0008628 

[0ULD]-/UH  D/ 

9 

0.0002058 

5649 

0.0019659 

1444 

0.0010938 

*(0Uri*/AH/ 

10 

0.0002286 

443 

0.0001542 

330 

0.0002500 

(0UPJ-/UW  P/ 

3 

0.0000686 

531 

0.0001848 

275 

0.0002083 

[0U]-/AW/ 

107 

0.0024466 

8077 

0.0028109 

4168 

0.0031572 

(0Y1-/0Y/ 

28 

0.0006402 

1137 

0.0G03957 

685 

0.0005189 

[0INGJ-/0W  IH  NX/ 

3 

0.0000686 

422 

0.0001469 

216 

0.0001636 

(01 ]-/OY/ 

42 

0.0009603 

1903 

0.0006623 

1230 

0.0009317 

( 00R ] «/A0  R/ 

12 

0.0002744 

745 

0.0002593 

397 

0.0003007 

(0QKJ-/UH  K/ 

13 

0.0002972 

1948 

0.0006779 

1097 

0.0008310 

(0QD]-/UH  D/ 

19 

0.0004344 

1847 

0.0006428 

901 

0.0006825 

(001-/UW/ 

60 

0.0013719 

3764 

0.0013099 

1852 

0.0014029 

(01E-/GW/ 

20 

0.0004573 

772 

0.0002687 

360 

0.0002727 

[0]  */aw/ 

49 

0.001 1204 

7433 

0.0025868 

2319 

0.0017566 

(0AJ-/0W/ 

47 

0.0010747 

1964 

0.0006835 

1016 

0.0007696 

[0NLY]-/0W  N L IY/ 

1 

0.0000229 

1747 

0.0006080 

460 

0.0003484 

[0NCE1-/W  AH  N S/ 

1 

0.0000229 

499 

0.0001737 

262 

0.0001985 

(ON  ' TJ-/QW  N T/ 

2 

0.0000457 

594 

0.0002067 

250 

0.0001894 

C[Q]N-/AA/ 

179 

0.0040928 

7030 

0.0024465 

4843 

0.0036685 

(0]NG-/A0/ 

22 

0.0005030 

2475 

0.0008613 

1451 

0.0010991 

~«[0]N-/AH/ 

57 

0.0013033 

2364 

0.0008227 

1432 

0.0010847 

IC0N]«/AX  N/ 

362 

0.0082771 

14961 

0.0052066 

8533 

0.0064637 

#* (ON]  -/AX  N/ 

70 

0.0016005 

2648 

0.0009215 

1286 

0.0009741 

#"(0N]»/AX  N/ 

23 

0.0005259 

691 

0.0002405 

459 

0.0003477 

(03ST  -/OW/ 

8 

0.0001829 

2137 

0.0007437 

965 

0.0007310 

[OF] “-/AO  F/ 

17 

0.0003887 

2065 

0.0007186 

1161 

0.0008794 

(OTHER 3-/AH  DH  ER/ 

12 

0.0002744 

3231 

0.0011244 

1339 

0.0010143 

COSS]  -/AO  S/ 

6 

0.0001372 

520 

0.0001810 

327 

0.00024  77 

#'*«(0M]-/AH  M/ 

49 

0.0011204 

1627 

0.0005662 

931 

0.0007052 

(01-/AA/ 

850 

0.0194352 

51239 

0.0178319 

22065 

0.0167141 

3085  0.0705385  241964  0.0842067 
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Table  6 (continued) 

STAT  Results  for  the  First  8000  Words  of  the  Brown  Corpus 
Translated  by  Version  3 of  the  Rules 


No.  of 

Total  Frequencies 

Total  No.  of  Texts 

Rule 

Words  Matched 

of  Words  Matched 

for  Words  Matched 

Abs.  Relative 

Abs.  Relative 

Abs.  Relative 

***  PRULE  *** 


[PHJ-/F/ 

59 

0.001 3490 

1717 

0.0005975 

1031 

0.0007810 

[PEOPWP  IY  P/ 

3 

0.0000686 

902 

0.0003139 

326 

0.0002469 

[POWWP  AW/ 

6 

0.0001372 

535 

0.0001862 

277 

0.0002098 

[PUT]  */P  UH  T/ 

3 

0.0000686 

492 

0.0001712 

271 

0.0002053 

[PJ*/P/ 

1556 

0.0355779 

69000 

0.0240129 

43119 

0.0326623 

1627 

0.0372013 

72646 

0.0252818 

45024 

0.0341053 

***  QRULE  *** 

CQUARIVK  W AO  R/ 

7 

0.0001601 

314 

0.0001093 

198 

0.0001500 

[QUWK  W/ 

76 

0.0017377 

3287 

0.0011439 

2233 

0.0016915 

[Q]*/K/ 

2 

0.0000457 

35 

0.0000122 

3 

0.0000023 

85 

0.0019435 

3636 

0.0012654 

2434  0.0018437 

***  RRULE  *** 

• 

(HE]A#=«/R  IY/ 

186 

0.0042529 

8287 

0.0028840 

5727 

0.0043382 

[ R1-/R/ 

1497 

0.0342289' 

73680 

0.0256416 

41537 

0.0314639 

1683 

0.0384818 

81967 

0.0285256 

47264 

0.0358021 

***  S RULE  ★** 

CSHWSH/ 

177 

0.0040471 

10754 

0.0037425 

4981 

0.0037731 

#(SION ]*/ZH  AX  N/ 

23 

0.0005259 

972 

0.0003383 

643 

0.0004871 

[SOMEWS  AH  M/ 

12 

0.0002744 

2772 

0.0009647 

1162 

0.0008802 

#[SUR]#-/ZH  ER/ 

11 

0.0002515 

476 

0.0001657 

288 

0.0002182 

[SUR]#*/SH  ER/ 

10 

0.0002286 

709 

0.0002467 

452 

0.0003424 

#[SU]#=/ZH  UW/ 

5 

0.0001143 

416 

0.0001448 

291 

0.0002204 

#[SSU]#-/SH  UW/ 

5 

0.0001143 

322 

0.0001121 

178 

0.0001348 

#(SED]  */Z  0/ 

26 

0.0005945 

1686 

C. 0005867 

1090 

0.0008257 

#(S]#«/Z/ 

271 

0.0061964 

13840 

0.0048165 

8563 

0.0064864 

[SAIDWS  EH  0/ 

1 

0.0000229 

1961 

0.0006825 

317 

0.0002401 

AtSI0N]=/SH  AX  N/ 

43 

0.0009832 

1415 

0.0004924 

912 

0.0006908 

CS]S«/  / 

248 

0.0056705 

10255 

0.0035689 

6435 

0.0048745 

.tS]  »/Z/ 

512 

0.0117069 

21193 

0.0073754 

12390 

0.0093853 

#«.EtSJ  =»/Z/ 

138 

0.0031554 

5887 

0.0020488 

3662 

0.0027739 

#At##CS]  */Z/ 

107 

0.0024466 

4437 

0.0015441 

2487 

0.0018839 

#A»#[S]  =*/S/ 

89 

0.0020350 

3773 

0.0013131 

1928 

0.0014604 

UtS]  «/S/ 

3 

0.0000686 

778 

0.0002708 

308 

0.0002333 

:#[S]  */Z/ 

39 

0.000891 7 

38870 

0.0135273 

3562 

0.0026982 

CSCH3=>/S  K/ 

9 

0.0002058 

883 

0.0003073 

327 

0.0002477 

[S]C+«/  / 

20 

0.0004573 

723 

0.0002516 

434 

0.0003288 

#[ SM3=/Z  M/ 

26 

0.0005945 

514 

0.0001789 

289 

0.0002189 

#CSN]  '=/Z  AX  N/ 

3 

0.0000686 

271 

0.0000943 

162 

0.0001227 

CSJVS/ 

2063 

0.0471705 

104475 

0.0363587 

60294 

0.0456722 

3841 

0.0878244 

227382 

0.0791320 

111155 

0.0841990 

22 


v''-  w\VCCf-;<vf-v(\  'lr'.  f •"■ 


NRL  REPORT  7948 


Table  6 (continued) 

STAT  Results  for  the  First  8000  Words  of  the  Brown  Corpus 
Translated  by  Version  3 of  the  Rules 


No.  of 

Total  Frequencies 

Total  No.  of  Texts 

Words  Matched 

of  Words  Matched 

for  Words  Matched 

Abs. 

Relative 

Abs. 

Relative 

Abs. 

Relative 

***  TRULE  *** 

[THE]  «/DH  AX/ 

1 

0.0000229 

69971 

0.0243508 

500 

0.0003787 

[TO]  */T  UW/ 

14 

0.0003201 

28177 

0.0098060 

1083 

0.0008204 

[THAT]  -/DH  AE  T/ 

2 

0.0000457 

10781 

0.0037519 

601 

0.0004553 

[THIS]  »/DH  !H  S/ 

I 

0.0000229 

5146 

0.0017909 

495 

0.0003750 

[THEY1-/DH  EY/ 

5 

0.0001143 

3761 

0.0013089 

575 

0.0004356 

[ THERE 1-/DH  EH  R/ 

8 

0.0001829 

3142 

0.0010935 

741 

0.0005613 

(THERJ-/DH  ER/ 

27 

0.0006174 

2408 

0.0008380 

1599 

0.0012112 

[THE I R]*/DH  EH  R/ 

2 

0.0000457 

2691 

0.0009365 

484 

0.0003666 

[THAN]  »/DH  AE  N/ 

1 

0.0000229 

1789 

0.0006226 

456 

0.0003454 

(THEM]  »/DH  EH  M/ 

1 

0.0000229 

1789 

0.0006226 

429 

0.0003250 

[THESE]  -/DH  IY  Z/ 

I 

0.0000229 

1573 

0.0005474 

413 

0.0003128 

(THENJ-/DH  EH  N/ 

1 

0.0000229 

1377 

0.0004792 

408 

0.0003091 

[TH ROUGH J-/TH  R UW/ 

2 

0.0000457 

1110 

0.0003863 

478 

0.0003621 

[THOSE ]*/DH  0W  Z/ 

1 

0.0000229 

850 

0.0002958 

367 

0.0002780 

[THOUGH]  -/DH  OH/ 

2 

0.0000457 

761 

0.0002648 

439 

0.0003325 

[THUS1-/0H  AH  S/ 

1 

0.0000229 

312 

0.0001086 

180 

0.0001363 

[THJ-/TH/ 

191 

0.0043672 

19586 

0.0068162 

7526 

0.0057009 

#*  tTEDl  »/T  IH  0/ 

186 

0.0042529 

6418'  0.0022335 

4758 

0.0036041 

StTI ]#N«/CH/ 

12 

0.0002744 

756 

0.0002631 

428 

0.0003242 

[TIJ0-/SH/ 

338 

0.0077284 

13438 

0.0046766 

7733 

0.0058577 

[THA-/SH/ 

17 

0.0003887 

603 

0.0002099 

419 

0.0003174 

[TIENJ-/SH  AX  N/ 

4 

0.0000915 

165 

0.0000574 

75 

0.0000568 

[TUR]#»/CH  ER/ 

55 

0.0012576 

2573 

0.0008954 

1519 

0.0011506 

[TU]A-/CH  UW/ 

15 

0.0003430 

858 

0.0002986 

579 

0.0004386 

ITWOWT  UW/ 

2 

0.000045 l 

1424 

0.0004956 

440 

0.0003333 

(T]»/T/ 

3064 

0.0700583 

183179 

0.0637488 

93605 

0.0709050 

3954 

0.0904081  364638 

0. 1 268939 

126330 

0.0956940 

***  URULE  *** 

• 

[UNH-/Y  UW  N/ 

15 

0.0003430 

1461 

0.0005084 

633 

0.0004795 

[UNWAH  N/ 

49 

0.001 1204 

2462 

0.0008568 

1626 

0.0012317 

[UPON WAX  P AO  N/ 

1 

0.0000229 

495 

0.0001723 

235 

0.0001780 

3£UR]#»/UH  R/ 

15 

0.0003430 

1084 

0.0003772 

555 

0.0004204 

(UR]#*/Y  UH  R/ 

26 

0.0005945 

980 

0.0003411 

656 

0.0004969 

£UR]-/ER/ 

109 

0.0024923 

4572 

0.0015911 

2832 

0.0021452 

£ur  -/AH/ 

70 

0.0016005 

9270 

0.0032261 

2461 

0.0018642 

(U]*'‘*-/AH/ 

366 

0.0083686 

17715 

0.0061651 

9963 

0.0075469 

(UY1-/AY/ 

5 

0.0001143 

182 

0.0000633 

116 

0.0000879 

G(U]#-/  / 

16 

0.0003658 

470 

0.0001636 

325 

0.0002462 

G[UJX-/  / 

11 

0.0002515 

270 

0.0000940 

191 

0.0001447 

G[U]#=/W/ 

9 

0.0002058 

278 

0.0000967 

172 

0.0001303 

#N[U]=/Y  UW/ 

25 

0.0005716 

1149 

0.0003999 

796 

0.0006030 

BIUWUW/ 

198 

0.0045273 

7998 

0.0027834 

4884 

0.0036996 

[U]-/Y  UW/ 

149 

0.0034069 

7024 

0.0024444 

3952 

0.0029936 

1064 

0.0243283 

55410 

0.0192834 

29397  0.0222680 
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Table  7 

STAT  Results  for  the  1000-Word  Sample  from  the  Low-Frequency  End  of  the 
Brown  Corpus  Translated  by  Version  3 of  the  Rules 


No.  of 

Total  Frequencies 

Total  No.  of  Texts 

Rule 

Words  Matched 

of  Words  Matched 

for  Words  Matched 

Abs.  | 

Relative 

Abs. 

Relative 

Abs. 

Relative 

Mr*  ARULE  itkit 

[A]  -/AX/ 

30 

0.0046069 

41 

0.0045480 

34 

G. 0041474 

CARE]  -/AA  R/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

CARJO-/AX  R/ 

1 

0.0001536 

2 

0.0002219 

1 

0.0001220 

[ A R]  #-/EH  W 

10 

0.0015356 

15 

0.0016639 

14 

0.0017077 

"'1  AS]#»/EY  5/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

CAINA-/AX/ 

2 

0.0003071 

3 

0.0003328 

3 

0.0003659 

CAW1-/A0/ 

6 

0.0009214 

to 

0.0011093 

9 

0.0010978 

* C ANY ]«/EH  N IY/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

ur+#-/EY/ 

28 

0.0042998 

39 

0.0043261 

38 

0.0046353 

#*  C ALLY1-/AX  L IY/ 

4 

0.0006142 

4 

0.0004437 

4 

0.0004879 

CAL)#-/AX  U 

1 

0.0001536 

2 

0.0002219 

1 

0.0001220 

CAGAIN1-/AX  G EH  N/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

#«CAG]E-/IH  JH/ 

3 

0.0004607 

4 

0.0004437 

4 

0.0004879 

IA]^+»#-/AE/ 

39 

0.0059889 

56 

0.0062119 

52 

0.0063430 

* C A]*+  -/EY/ 

6 

0.0009214 

8 

0.0008874 

7 

0.0008539 

urx«/EY/ 

39 

0.0059889 

56 

0.0062119 

51 

0.0062210 

CARR  WAX  R/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

CARR1-/AE  R/ 

4 

0.0006142 

4 

0.0004437 

4 

0.0004879 

•CAR]  */AA  R/ 

0 

o.ooooooo. 

0 

0.0000000 

0 

0.0000000 

CAR]  -/ER/ 

4 

0.0006142 

6 

0.0006656 

5 

0.0006099 

CAR1-/AA  W 

33 

0.0050676 

44 

0.0048808 

41 

0.O050012 

CAIRJ-/EH  R/ 

5 

0.0007678 

7 

0.0007765 

7 

0.0008539 

CAI WEY/ 

12 

0.0018428 

19 

0.0021076 

17 

0.0020737 

CAY  WEY/ 

12 

0.0018428 

18 

0.0019967 

17  0.0020737 

CAU1-/A0/ 

18 

0.0027641 

25 

0.00277 32 

22 

0.0026836 

#« C AL3  -/AX  L/ 

22 

0.0033784 

27 

0.0029950 

26 

0.0031715 

#* £ ALS]  -/AX  L Z/ 

0 

0.0000000 

0 

0.0000000 

0 0.0000000 

CALK)-/ AO  K/ 

1 

0.0001536 

1 

0.0001109 

1 

0.0001220 

CALT-/AO  L/ 

24 

0.0036855 

32 

0.0035496 

31 

0.0037814 

«CABLE)-/EY  B AX  L/ 

2 

0.0003071 

4 

0.0004437 

4 

0.0004879 

CABLE)-/ AX  B AX  L/ 

4 

0.0006142 

5 

0.0005546 

5 

0.0006099 

C ANG1+-/EY  N JH/ 

1 

0.0001536 

2 

0.0002219 

1 

0.0001220 

CA3-/AE/ 

263 

0.0403870 

366 

0.0405990 

325 

0.0396438 

574 

0.0881450 

800 

0.0887410 

724  0.0883142 
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Table  7 (continued) 

ST  AT  Results  for  the  1000- Word  Sample  from  the  Low-Frequency  End  of  the 
Brown  Corpus  Translated  by  Version  3 of  the  Rules 


Rule 

No.  of 

1 .Words  Matched 

Total  Frequencies 
of  Words  Matched 

Total  No.  of  Texts 
for  Words  Matched 

| Abs. 

Relative 

Abs. 

Relative 

Abs. 

Relative 

***  B RULE  *** 

(BEr#»/B  IH/ 

5 

0.0007678 

7 

0.0007765 

5 

0.0006099 

IBEINGWB  IY  IH  NX/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

[BOTH]  «/B  OW  TH/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

[BUS ]#*/B  IH  Z/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

CBUIL]«/B  IH  L/ 

1 

0.0001536 

1 

0.0001109 

1 

0.0001220 

(B]-/B/ 

146 

0.0224201 

207 

0.0229617 

189 

0.0230544 

152 

0.0233415 

215 

0.0238491 

195 

0.0237863 

***  CRULE  *** 

CCHr»/K/ 

1 

0.0001536 

2 

0.0002219 

1 

0.0001220 

“E[CH]*/K/ 

2 

0.0003071 

3 

0.0003328 

3 

0.0003659 

[CHWCH/ 

38 

0.0058354 

57 

0.0063228 

45 

0.0054891 

S[CI]#»/S  AY/ 

0 

0,0000000 

0 

0.0000000 

0 

0.0000000 

tCI ] A*/SH/ 

5 

0.0007678 

7 

0.0007765 

7 

0.0008539 

(CI)0»/SH/- 

2 

0.0003071 

4 

0.0004437 

4 

0.0004879 

[CIJEN-/SH/ 

1'  0.0001536 

« 

» 

0.0001109 

1 

0.0001220 

(CJ+-/S/ 

47 

0.0072174 

66 

0.007321 1 

60 

0.0073189 

[CK]»/K/ 

33 

0.0050676 

46 

0.0051026 

45 

0.0054891 

(COM)WK  AH  M/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

(CJ-/K/ 

174 

0.0267199 

251 

0.0278425 

225 

0.0274457 

303 

0.0465295 

437 

0.0484748 

391 

0.0476946 

kirk  DRULE  kick 

#> [ DED]  »/D  IH  D/ 

8 

0.0012285 

12 

0.001 3311 

11 

0.0013418 

.ECO]  -/□/ 

37 

0.0056818 

55 

0.0061009 

52 

0.0063430 

r'lEtD]  »/T/ 

12 

0.0018428 

18 

0.0019967 

18 

0.0021957 

(DE]*#*/D  IH/ 

13 

0.0019963 

19 

0.0021076 

17 

0.0020737 

[DO]  »/D  UW/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

[DOES ]»/D  AH  Z/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

[DOINO-/D  UW  IH  NX/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

[DOW]*/D  AW/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

IDU1A-/JH  UW/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

[D]*/D/ 

201 

0.0308661 

275 

0.0305047 

251 

0.0306172 

271 

0.0416155 

379 

0.0420410 

349 

0.0425714 

TV- 


V OO  O. 


$ 
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Table  7 (continued) 

STAT  Results  for  the  1000-Word  Sample  from  the  Low-Frequency  End  of  the 
Brown  Ctipus  Translated  by  Version  3 of  the  Rules 


No.  of 

Total  Frequencies 

Total  No.  of  Texts 

Rule 

Words  Matched 

of  Words  Matched 

for  Words  Matched 

Abs.  | Relative 

Abs.  j Relative 

Abs.  | Relative 

***  ERULE  *** 

#«[£]  */  / 

108  0.0165848 

149  0.0165280 

136  0.0165894 

' *t[E]  -/  / 

0 0.0000000 

0 0.0000000 

0 0.0000000 

* [ EJ  */IY/ 

4 0.0006142 

6 0.0006656 

6 0.0007319 

#t EDI  m/D/ 

2 0.0003071 

3 0.0003328 

3 0.0003659 

ftCEJD  */  / 

45  0.0069103 

68  0.0075430 

65  0.0079288 

(EV1ER-/EH  V/ 

2 0.0003071 

3 0.0003328 

3 0.0003659 

[ErwiY/ 

17  0.0026106 

30  0.0033278 

28  0.0034155 

£ E RI ]#«/IY  R IY/ 

0 0.0000000 

0 0.0000000 

0 0.000000 0 

C ER I ]-/EH  R IH/ 

4 0.0006142 

4 0.0004437 

4 0.0004879 

#»(ER]#-/ER/ 

11  0.0016892 

15  0.0016639 

15  0.0018297 

[ER)#-/EH  W 

2 0.0003071 

4 0.0004437 

4 0.0004879 

[ER 1-/ER/ 

105  0.0161241 

153  0.0169717 

135  0.0164674 

[ EVEN ]-/I Y V EH  N/ 

0 0.0000000 

0 0.0000000 

0 0.0000000 

#«[£]W-/  / 

1 0.0001536 

1 0.0001109 

1 0.0001220 

0(EW]-/UW/ 

6 0.0009214 

10  0.0011093 

8 0.0009758 

(EWJ-/Y  UW/ 

0 0.0000000 

0 0.0000000 

0 0.000000 0 

[E10-/IY/ 

6 0.0009214 

9 0.0009983 

8 0.0009758 

#«&(ES]  a/IH  Z/ 

II  0.0016892 

15  0.0016639 

12  0.0014638 

#«[E)S  -/  / 

38  0.0058354 

53  0.0058791 

52  0.0063430 

#« [ELY]  m/L  IY/ 

5 0.0007678 

8 0.0008874 

8 0.0009758 

#«(EMENT]-/M  EH  N T/ 

2 0.0003071 

3 0.0003328 

3 0.0003659 

[tFULJ-/F  UH  L/ 

1 0.0001536 

2 0.0002219 

2 0.0002440 

( EEJ^/IY/ 

22  0.0033784 

29  0.0032169 

25  0.0030495 

[EARN]*/ER  N/ 

0 0.0000000 

0 0.0000000 

0 0.0000000 

(EAR]"*/ER/ 

0 0.0000000 

0 0.0000000 

0 0.0000000 

(EAD]*/EH  0/ 

4 0.0006142 

/ 0.0007/65 

/ 0.0008539 

#«(EA]  -/IY  AX/ 

0 0.0000000 

0 0.0000000 

0 0.0000000 

(EAJSU-/EH/ 

0 0.0000000 

0 0.00000 00 

0 0.000000 0 

[EAJ-/IY/ 

36  0.0055283 

46  0.0051026 

44  0.0053672 

[EIG1U-/EY/ 

1 0.0001536 

2 0.0002219 

2 0.0002440 

(EI3»/IY/ 

5 0.0007678 

7 0.0007765 

6 0.0007319 

[EYE] -/AY/ 

1 0.0001536 

1 0.0001109 

1 0.0001220 

[EYJ-/IY/ 

15  0.0023034 

18  0.0019967 

17  0.0020737 

(EU3-/Y  UW/ 

7 0.0010749 

10  0.0011093 

8 0,0009758 

(EJ-/EH/ 

301  0.0462224 

403  0.0447033 

369  0.04501 10 

762  0.1170147 

1059  0.1174709 

972  0.1185655 
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Table  7 (continued) 

STAT  Results  for  the  1000-Word  Sample  from  the  Low-Frequency  End  of  the 
Brown  Corpus  Translated  by  Version  3 of  the  Rules 


No.  of 

Total  Frequencies 

Total  No.  of  Texts 

Rule 

Words  Matched 

of  Words  Matched 

for  Words  Matched 

Abs.  Relative 

Abs. 

Relative 

Abs. 

Relative 

***  FfiULE  *** 


[FULJ-/F  UH  1/ 
[FJ-/F/ 

4 

115 

0.0006142 

0.0176597 

7 

159 

0.0007765 

0.0176373 

7 

145 

0.0008539 

0.0176872 

119 

0.0182740 

166 

0.0184138 

152 

0.0185411 

***.  GRULE  *** 

CGIV1-/G  IH  V/ 

0 

0.0000000 

0 

0.000000 0 

0 

0.0000000 

[Gjr«/G/ 

3 

0.0004607 

" 4 

0.0004437 

4 

0.0004879 

£GE]T»/G  EH/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

SUCGGES 1*/G  JH  EH  S/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

CGGJ-/G/ 

4 

0.0006142 

5 

0.0005546 

4 

0.0004879 

B#CGJ-/G/ 

1 

0.0001536 

1 

0.0001109 

1 

0.0001220 

CG1+-/JH/ 

34 

0.0052211 

49 

0.0054354 

43 

0.0052452 

tGREATWG  R EY  T/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

#CGH1-/  / 

1 

0.0001536 

1 

0.0001109 

1 

0.0001220 

(GJ-/G/ 

73 

0.0 1 121 01 

101 

0.0112035 

91 

0.0111003 

116 

0.0178133 

161 

0.01 78591 

144 

0.0175653 

***  HRULE  *** 

IHAVJ-/HH  AE  V/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

CHEREJ-/HH  1Y  H/ 

0 

0.0000000 

0 

0. 0000000 

0 

0.0000000 

[H0UR]»/AW  ER/ 

1 

0.0001536 

2 

0.0002219 

1 

0.0001220 

(H0W1-/HH  AH/ 

1 

0.0001536 

2 

0.0002219 

1 

0.0001220 

C HJ#*/HH 1 

58 

0.0089066 

76 

0.0084304 

70 

0.0085387 

tHl=»/  / 

10 

0.0015356 

12 

0.0013311 

11 

0.0013418 

92  0.0102052 


70  0.0107494 


83  0.0101244 
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Table  7 (continued) 

STAT  Results  for  the  1000-Word  Sample  from  the  Low-Frequency  End  of  the 
Brown  Corpus  Translated  by  Version  3 of  the  Rules 


No.  of 

Total  Frequencies 

Total  No.  of  Texts 

Rule 

Words  Matched 

of  Words  Matched 

for  Words  Matched 

Abs. 

Relative 

Abs. 

Relative 

Abs. 

Relative  j 

***  I RULE  *** 

[ IN]-/IH  N/ 

27 

0.0041462 

36 

0.0039933 

35 

0.0042693 

m -/AY/ 

2 

0.0003071 

3 

0.0003328 

2 

0.0002440 

t INJD-/AY  N/ 

4 

0.0006142 

6 

0.0006656 

4 

0.0004879 

C IERI-/IY  ER/ 

0 

u. 00092 1 4 

9 

0.0009983 

8 

0.0009758 

#«RCIED]  -/IY  0/ 

1 

0.0001536 

1 

0.0001109 

1 

0.0001220 

l! ED]  -/AY  0/ 

7 

0.0010749 

7 

0.0007765 

7 

0.0008539 

C IEN1-/I Y EH  N/ 

1 

0.0001536 

1 

0.0001109 

1 

0.0001220 

t IE1T-/AY  EH/ 

2 

0.0003071 

3 

0.0003328 

3 

0.0003659 

* 1 1 JX-/AY/ 

1 

0.0001536 

1 

0.0001109 

1 

0.8001220 

[ I 1X-/IY/ 

17 

0.0026106 

27 

0.0029950 

27 

0.0032935 

[ IEJ-/IY/ 

11 

0.0016892 

16 

0.0017748 

14 

0.001 7077 

Cir+«#-/IH/ 

56 

0.0085995 

71 

0.0078758 

67 

0.0081727 

1 1 R]#-/AY  9/ 

4 

0.0006142 

6 

0.0006656 

6 

0.0007319 

CIZ1WAY  Z/ 

7 

0.0010749 

9 

0.0009983 

8 

0.0009758 

CIS] WAY  Z/ 

4 

0.0006142 

5 

0.0005546 

5 

0.0006099 

CIJD  WAY/ 

4 

0.0006142 

7 

0.0007765 

6 

0.0007319 

+At  I ]^+-/I H/ 

11 

0.0016892 

16 

0.001 7748 

14 

0.0017077 

C I JTX-/AY/ 

6 

0.0  09214 

7 

0.0007765 

7 

0.0008539 

#<'iin'*+»/IH/ 

20 

0.0030713 

29 

0.0032169 

25 

0.0030495 

m*+*/AY/ 

16 

0.0024570 

25 

0.0027732 

20  0.0024396 

C IR3-/ER/ 

12 

0.0018428 

18 

0.0019967 

16 

0.0019517 

CIGHWAY/ 

6 

0.0009214 

8 

0.0008874 

7 

0.0008539 

t ILD]-/AY  L D/ 

0 

0.0000000 

0 

0. 0000000 

0 

0.0000000 

C IGN ] -/AY  N/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

CIGNr-/AY  N/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

( IGN ]X-/AY  N/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

CIQUEWIY  K/ 

3 

0.0004607 

6 

0.0006656 

5 

0.0006099 

CI1-/IH/ 

356 

0.0546683 

493 

0.0546866 

445 

0.0542815 

584 

0.0896806 

810 

0.0898502 

734 

0.0895340 

**★  JRULE  **★ 


C J]-/JH/ 

20  0.0030713 

28  0.0031059 

24  0.0029275 

20  0.0030713 

28  0.0031059 

24  0.0029275 

***  krule  *** 

CK ]N-/  / 

2 0.0003071 

3 0.0003328 

3 0.0003659 

CK]=/K/ 

62  0.0095209 

81  0.0089850 

71  0.0086606 

64  0.0098280 


84  0.0093178 


74  0.0090266 


* i » V.- 


' i - vV:^SV.\i  .Vlv  .•.‘A'.  \\v 
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Table  7 (continued) 

STAT  Results  for  the  1000-Word  Sample  from  the  Low-Frequency  End  of  the 
Brown  Corpus  Translated  by  Version  3 of  the  Rules 


J\ 


No.  of 

Total  Frequencies 

Total  No.  of  Texts 

Rule 

Words  Matched 

of  Words  Matched 

for  Words  Matched 

Abs.  | Relative 

Abs.  Relative 

Abs.  Relative 

***  LfiULE  *** 

ILO]C#*/L  OW/ 

0 0.0000000 

0 0.0000000 

0 0.0000000 

LtLW  / 

38  0.0058354 

51  0.0056572 

50  0.0060990 

#“«[L]X=/AX  L/ 

21  0.0032248 

26  0.0028841 

23  0.0028056 

[LEADWL  IY  D/ 

2 0.0003071 

2 0.0002219 

2 0.0002440 

IL1-/L/ 

297  0.0456081 

422  0.0468109 

387  0.0472066 

358  0.0549754 

501  0.0555740 

462  0.0563552 

***  M RULE  *** 

[MOV ]*/M  UW  V/ 

0 0.0000000 

0 0.0000000 

0 0.0000000 

[MWM/ 

231  0.0354730 

317  0.0351636 

287  0.0350085 

231  0.0354730 

317  0.0351636 

287  0.0350085 

***  NRULE  *** 

EtNGlWN  JH/ 

0 0.0000000 

0 0.0000000 

0 0.0000000 

tNG ]R*/NX  0/ 

2 0.0003071 

3 0.0003328 

2 0.0002440 

tNG]#»/NX  G/ 

6 0.0009214 

8 0.0008874 

6 0.0007319 

INGL1X-/NX  G AX  L/ 

2 0.0003071 

3 0.0003328 

3 0.0003659 

tNGWNX/ 

84  0.0128993 

113  0.0125347 

110  0.0134179 

[NK1-/NX  K/ 

8 0.0012285 

II  0.0012202 

10  0.0012198 

[NOW]  */N  AW/ 

0 0.0000000 

0 0.0000000 

0 0.0000000 

CN]=/N/ 

359  0.0551290 

490  0.0543539 

443  0.0540376 

461  0.0707924 

628  0.0696617 

574  0.0700171 
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Table  7 (continued) 

STAT  Results  for  the  1000-Word  Sample  from  the  Low-Frequency  End  of  the 
Brown  Corpus  Translated  by  Version  3 of  the  Rules 


No.  of 

Total  Frequencies 

Total  No.  of  Texts 

Rule 

Words  Matched 

of  Words  Matched 

for  Words  Matched 

Abs. 

Relative 

Abs. 

Relative 

Abs. 

Relative 

***  ORULE  *** 

[OF]  =/AX  V/ 

2 

0.0003071 

3 

0.0003328 

3 

0.0003659 

CORQUGHWER  OW/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

#i [OR]  -/ER/ 

4 

0.0006142 

5 

0.0005546 

4 

0.0004879 

#*  CORS ] -/ER  Z/ 

3 

0.0004607 

5 

0.0005546 

5 

0.0006099 

I OR]*/ AO  R/ 

55 

0.0084459 

87 

0.0096506 

72 

0.0087826 

CONEJ-/W  AH  N/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

[CJW1-/OW/ 

25 

0.0038391 

31 

0.0034387 

29 

0.0035374 

t OVERWOW  V ER/ 

4 

0.0006142 

5 

0.0005546 

5 

0.0006099 

[OVWAH  V/ 

II 

0.0016892 

17  0.0018857 

13 

0.0015858 

CO]AWOW/ 

II 

0.00168.92 

13 

0.0014420 

13 

0.0015858 

corEN*/ow/ 

5 

0.0007678 

7 

0.0007765 

7 

0.0008539 

[OJAI#-/aW/ 

12 

0.0018428 

19 

0.0021076 

16 

0.0019517 

[QUD-/OW  L/ 

2 

0.0003071 

2 

0.0002219 

2 

0.0002440 

[OUGHT) -/AO  T/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

COUGH] -/AH  F/ 

1 

0.0001536 

2 

0.0002219 

2 

0.0002440 

COU1-/AW/ 

3 

0.0004607 

5 

0.0005546 

4 

0.0004879 

HCQU)S#-/AW/ 

4 

0.0006142 

4 

0.0004437 

4 

0.0004879 

COUSWAX  S/ 

8 

0.0012285 

• 11 

0.0012202 

II 

0.0013418 

COUR] -/AO  W 

3 

0.0004607 

5 

0.0005546 

5 

0.0006099 

[OULDJ-/UH  D/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

"COU]AL-/AH/ 

1 

0.0001536 

1 

O'.  000 11  09 

1 

0.0001220 

COUP]*/UW  P/ 

1 

0.0001536 

1 

0.0001 109 

1 

0.0001220 

COU]=/AW/ 

16 

0.0024570 

23 

0.0025513 

22 

0.0026836 

COYWOY/ 

3 

0.0004607 

4 

0.0004437 

4 

0.0004879 

COING]«/CW  IH  NX/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

COI]-/OY/ 

9 

0.0013821 

to 

0.0011093 

9 

0. 00 10978 

COORWAO  R/ 

2 

0.00030  71 

3 

0.0003328 

3 

0.0003659 

CUOK1-/UH  K/ 

3 

0.0004607 

4 

0.0004437 

4 

0.0004879 

COOD1-/UH  0/ 

3 

0.0004607 

5 

0.0005546 

5 

0.0006099 

COOWUW/ 

16 

0.0024570 

1 8 

0.0019967 

18 

0.0021957 

C01E-/0W/ 

1 

0.0001536 

1 

0.0001109 

1 

0.0 001220 

CO]  -/OW/ 

32 

0.0049140 

44 

0.0048808 

36 

0.0043913 

coawow/ 

8 

0.0012285 

9 

0.0009983 

9 

0.0010978 

CONLYJ-/OW  N L IY/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

CONCEWW  AH  N S/ 

1 

0.0001536 

1 

0.0001109 

l 

0.0001220 

CON  ' TWOW  N T/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

CC01N-/AA/ 

17 

0.0026106 

24 

0.0026622 

21 

0.0025616 

COJNG-/AO/ 

3 

0.0004607 

4 

0.0004437 

4 

0.0004879 

A«CO]N=/AH/ 

14 

0.0021499 

21 

■0.0023294 

20 

0.0024396 

I CONJ-/AX  N/ 

25 

0.0038391 

33 

0.0036606 

32 

0.0039034 

#* CON)  -/AX  N/ 

19 

0.00291  77 

26 

0.0028841 

25 

0.0030495 

#ACUN J-/AX  N/ 

10 

0.0015356 

14 

0.0015530 

12 

0.0014638 

CO]ST  */0W/ 

1 

0.0001536 

1 

0.0001109 

1 

0.0001220 

[OF ] A=/AQ  F/ 

2 

0.0003071 

2 

0.0002219 

2 

0.0002440 

CQTHER]*/AH  DH  ER/ 

1 

0.0001536 

1 

0.0001109 

1 

0.0001220 

[OSS]  -/AO  S/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

#A«COM]-/AH  m/ 

8 

0.0012285 

13 

0.0014420 

10 

0.0012198 

CUJ-/AA/ 

1 22 

0.0187346 

165 

0.0183028 

151 

0.0184191 

471 

0.0723280 

649 

0.071991 1 

588 

0.0717248 

31 


ELOVITZ,  JOHNSON,  McHUGH,  AND  SHORE 


Table  7 (continued) 

STAT  Results  for  the  1000-Word  Sample  from  the  Low-Frequency  End  of  the 
Brown  Corpus  Translated  by  Version  3 of  the  Rules 


No.  of 

Total  Frequencies 

Total  No.  of  Texts 

Rule 

Words  Matched 

of  Words  Matched 

for  Words  Matched 

| Abs. 

Relative 

Abs. 

Relative 

Abs. 

Relative 

***  PRULE  *** 

IPHWF/ 

21 

0.0032248 

29 

0.0032169 

27 

0.0032935 

IPEQPJ*/P  IY  P/ 

1 

0.0001536 

1 

0.0001109 

1 

0.0001220 

IPOWJ-/P  AW/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

[PUT]  */P  UH  T/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

IP1-/P/ 

194 

0.0297912 

269 

0.0298392 

241 

0.0293974 

216 

0.0331695 

299 

0.0331669 

269 

0.0328129 

***  QRULE  *** 

IQUARWK  W AO  R/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

IQU1-/IC  W/ 

10 

0.0015356 

14 

0.0015530 

13 

0.0015858 

IQWK/ 

1 

0.0001536 

2 

0.0002219 

2 

0.0002440 

11 

0.0016892 

16 

0.0017748 

15 

0.0018297 

***  RRULE  *** 


[RE1*#*/R  IY/ 
[RWR/ 

14 

228 

0.0021499 

0.0350123 

19 

315 

0.0021076 

0.0349418 

18 

280 

0.0021957 

0.0341547 

242 

0.0371622 

334 

0.0370494 

298 

0.0363503 

***  SRULE  *** 

(SHWSH/ 

25 

0.0038391 

31 

0.0034387 

30 

0.0036594 

#ISI0N]*/ZH  AX  N/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

IS0ME)=/S  AH  M/ 

2 

0.0003071 

4 

0.0004437 

3 

0.0003659 

#ISUR]#«/ZH  ER/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

ISUR]#*/SH  ER/ 

1 

0.0001536 

2 

0.0002219 

2 

0.0002440 

#ISU1#-/ZH  UW/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

#lSbU]#*/SH  UW/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

#(SED]  «/Z  0/ 

3 

0.0004607 

4 

0.0004437 

4 

0.0004879 

#CS]#»/Z/ 

43 

0.0066032 

59 

0.0065446 

56 

0.0068309 

(SAID1-/S  EH  0/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

~ISIQNWSH  AX  N/ 

5 

0.0007678 

7 

0.0007765 

6 

0.0007319 

tS]S»/  / 

33 

0.005(5676 

42 

0.0046589 

39 

0.0047573 

.IS]  */z/ 

63 

0.0096744 

85 

0.0094287 

77 

0.0093925 

#« .ECS]  »/Z/ 

16 

0.0024570 

22 

0.0024404 

21 

0.0025616 

*/Z/ 

20 

0.0030713 

33 

0.0036606 

31 

0.0037814 

r'tfisi  »/s/ 

19 

0.002917/ 

25 

0.002  //3Z 

21 

0.0025616 

uts J »/s/ 

1 

0.0001536 

2 

0.0002219 

2 

0. 0002 440 

«#(S]  »/z/ 

8 

0.0012285 

10 

0.0011093 

9 

0.00109/8 

CSCH)“/S  K/ 

2 

0.0003071 

3 

0.0003328 

2 

0.0002440 

IS  1CW  / 

4 

0.0006142 

6 

0.0006656 

5 

0.O006099 

#CSM]“/Z  M/ 

7 

0.0010749 

1 1 

0.0012202 

10 

0.0012198 

#ISN1  '=/Z  AX  N/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

IS) a/S/ 

307 

0.0471437 

418 

0.0463672 

378 

0.0461088 

559 

0.0858415 

764 

0.0847476 

696 

0.0848988 
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Table  7 (continued) 

STAT  Results  for  the  1000-Word  Sample  from  the  Low-Frequency  End  of  the 
Brown  Corpus  Translated  by  Version  3 of  the  Rules 


No.  of 

Total  Frequencies 

Total  No.  of  Texts 

Rule 

Words  Matched 

of  Words  Matched 

for  Words  Matched 

Abs. 

Relative 

Abs. 

Relative 

Abs. 

Relative 

***  THULE  *** 


(THE]  =/DH  AX/ 

2 

0.0003071 

3 

0.0003328 

3 

0.0003659 

[TO]  */T  UW/ 

2 

0.0003071 

3 

0.0003328 

2 

0.0002440 

[THAT!  =*/DH  AE  T/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

[THIS]  */DH  IH  S/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

(THEYWDH  EY/ 

0 

0.0000000 

0 

0. 0000000 

0 

0.0000000 

(THEREWDH  EH  R/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

(THER]*/DH  ER/ 

3 

0.0004607 

5 

0.0005546 

5 

0.0006099 

[THEI RWDH  EH  R/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

[THAN]  */DH  AE  N/ 

1 

0.0001536 

2 

0.0002219 

2 

0.0002440 

[THEM]  */DH  EH  M/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

[THESE]  */DH  IY  1/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

(THENWDH  EH  N/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

(TH ROUGH 1-/TH  R UN/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

[THOSEWDH  ON  Z/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

[THOUGH]  =*/DH  OH/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

(THUSWDH  AH  S/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

(THWTH/ 

28 

0.0042998 

37 

0.0041043 

35 

0.0042693 

#» (TED]  */T  IH  0/ 

II 

0.0016892 

17 

0.0018857 

15 

0.0018297 

S(TI ]#N=/CH/ 

1 

0.0001536 

■1 

0.0001109 

1 

0.0001220 

[TI ]0*/SH/ 

20 

0.0030713 

28 

0.0031059 

27 

0.0032935 

[TIJA-/SH/ 

2 

0.0003071 

2 

0.0002219 

2 

0.0002440 

(TIEN]*/SH  AX  N/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

(TUR]#*/CH  ER/ 

3 

0.0004607 

6 

0.0006656 

4 

0.0004879 

CTU1A-/CH  UH/ 

4 

0.0006142 

6 

0.0006656 

6 

0.0007319 

(TWOWT  UW/ 

2 

0.0003071 

2 

0.0002219 

2 

0.0002440 

(T]«/T/ 

406 

0.0623464 

555 

0.0615641 

506 

0.0617224 

485 

0.0744779 

667 

0.0739878 

610 

0.0744084 

***  URULE 

itirk 

[UN] I*/Y  UH  N/ 

2 

0.0003071 

3 

0.0003328 

3 

0.0003659 

[UNWAH  N/ 

17 

0.0026106 

23 

0.0025513 

21 

0.0025616 

(UPON ]=/AX  P AO 

N/ 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

0(UR]#=/UH  R/ 

4 

0.0006142 

5 

0.0005546 

4 

0.0004879 

[UR]#*/Y  UH  R/ 

2 

0.0003071 

3 

0.0003328 

3 

0.0003659 

(URWER/ 

1 7 

0. 0026 106 

23 

0.0025513 

20 

0.0024396 

(UP  */AH/ 

13 

0.0019963 

16 

0,001 7748 

15 

0.0018297 

[UTWAH/ 

59 

0.0090602 

84 

0.00931 78 

76 

0.0092706 

CUY  WAY/ 

2 

0.0003071 

3 

0.0003328 

3 

0.0003659 

G(UJ#V  / 

1 

0.0001536 

1 

0.0001109 

1 

0.O001220 

G ( U ] %*/  / 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

G(U)#=/H/ 

2 

0.0003071 

2 

0.0002219 

2 

0.0002440 

#NtU]=/Y  UW/ 

3 

0.0004607 

3 

0.0003328 

3 

0.0003659 

3(UWUW/ 

23 

0.C0353I 9 

31 

0.0034387 

27 

0.0032935 

(U)=/Y  UW/ 

19 

0.0029177 

27 

0.0029950 

25 

0.0030495 

164  0.0251843 


224  0.0248475 


203  0.0247621 
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Table  7 (continued) 

STAT  Results  for  the  1000-Word  Sample  from  the  Low-Frequency  End  of  the 
Brown  Corpus  Translated  by  Version  3 of  the  Rules 


No.  of 

Total  Frequencies 

Total  No.  of  Texts 

Rule 

Words  Matched 

of  Words  Matched 

for  Words  Matched 

Abs.  | Relative 

Abs.  j Relative 

Abs.  | Relative 

★**  VRULE  *★* 

[view i*/v  y uw/ 

0 0.0000000 

0 0.0000000 

0 0.0000000 

CV]*/V/ 

66  0.0101351 

91  0.0100943 

85  0.0103684 

66  0.0101351 

91  0.0100943 

85  0.0103684 

***  W RULE  *** 

(WERE]*/W  ER/ 

0 0.0000000 

0 0.0000000 

0 0.0000000 

( WA1S*/W  A A/ 

1 0.0001536 

1 0.0001109 

1 0.0001220 

CWA]T*/W  AA/ 

2 0.0003071 

3 0.0003328 

3 0.0003659 

( WHERE J»/WH  EH  R/ 

0 0.0000000 

0 0.0000000 

0 0.0000000 

(WHAT]=*/WH  A A T/ 

1 0.0001536 

1 0.0001109 

1 0.0001220 

(WH0L1-/HH  OW  L/ 

1 0.0001536 

1 0.0001109 

1 0.0001220 

(WHOWHH  UW/ 

0 0.0000000 

0 0.0000000 

0 0.0000000 

(WH3»/WH/ 

5 0.0007678 

7 0.0007765 

6 0.0007319 

[WARWW  AO  R/ 

2 0.0003071 

3 0.0003328 

2 0.0002440 

IW0Rr*/W  ER/ 

7 0.0010749 

8 0.0008874 

8 0.0009758 

[WR]=/R/ 

1 0.0001536 

1 0.0001109 

1 0.0001220 

(W]*/W/ 

41  0.0062961 

56  0.0062119 

55  0.0067090 

61  0.0093673 

81  0.0089850 

78  0.0095145 

***  X RULE 

( X]=/K  S/ 

19  0.00291 77 

26  0.0028841 

23  0.0028056 

19  0.0029177 

26  0.0028841 

23  0.0028056 

***  Y RULE  *** 

** 

( YOUNG ]=/Y  AH  NX/ 

0 0.0000000 

0 0.0000000 

0 0.0000000 

CY0U]=/Y  UW/ 

0 0.0000000 

0 0.0000000 

0 0.0000000 

(YES 1*/Y  EH  S/ 

0 0.0000000 

0 0.0000000 

0 0.0000000 

( Y ]*/Y/ 

4 0.0006142 

6 0.0006656 

6 0.0007319 

#“«(Y1  =/IY/ 

67  0.0102887 

97  0.0107598 

91  0.0111003 

i»At(YH=/IY/ 

3 0.0004607 

3 0.0003328 

3 0.0003659 

* t Y 3 «/AY/ 

2 0.0003071 

2 0.0002219 

2 0.0002440 

»(Y]#*/AY/ 

2 0.0003071 

3 0.0003328 

2 0.0002440 

KYr+t#*/IH/ 

4 0.0006142 

6 0.0006656 

5 0.0006099 

«CY]*#=/AY/ 

5 0.0007678 

6 0.0006656 

5 0.0006099 

( Y1=/IH/ 

21  0.0032248 

30  C. 0033278 

25  0.0030495 

108  0.0165848 

153  0.0169717 

139  0.0169554 

py-; 

t. 

: v 

v.v 

,»v> 


***  ZRULE  *** 
(ZJ=/Z/ 


25  0.0038391 
25  0.0038391 


6512  1.0 


34  0.0037715 
34  0.0037715 


29  0.0035374 
29  0.0035374 


SSt33353  ==333 

9015  1.0 


=====  ========= 

8198  1.0 


34 
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Table  8 

STAT  Results  for  the  First  8000  Words  of  the  Brown  Corpus  and  the 
IPA-to-Votrax  Phase  of  the  Translation  by  Version  3 of  the  Rules 


No.  of 

Total  Frequencies 

Total  No.  of  Texts 

Rule 

Words  Matched 

of  Words  Matched 

for  Words  Matched 

Abs. 

Relative 

Abs. 

Relative 

Abs. 

Relative  J 

***  IYRULE  *** 


CIYJ-CE] 

1721  0.0363433 

119333  0.0353984 

58706 

0.0400941 

1721  0.0363433 

119333  0.0353984 

58706 

0.0400941 

***  IHHULb 

3609  0.0762132 

224302  0.0665360 

99065 

0.0676579 

3609  0.0762132 

224302  0.0665 360 

99065 

0.0676579 

***  EY  RULE 

*** 

L [EYJ  R-CUH3  Al  13] 
L [EYJ-CUH3  Al  AY] 
[EY]  R-CA  13] 

CEYMA  AY] 

0 0.0000000 
9*3  0.0020695 
2 0.0000422 
779  0.0164506 

0 0.0000000 
4764  0.0014132 
81  0.0000240 
46357  0.0137511 

0 0.0000000 
2947  0.0020127 
45  0.0000307 
24556  0.0I6"’709 

879  0.0185623 

51202  0.0151883 

27548 

0.0188143 

★**  ehrule 

*+* 

L CEHMUH3  EH] 
[EH1-CEH] 

1 73  0.0037589 
2290  0.0483592 

7349  0.0021800 
126746  0.0375974 

4476 

70661 

0.0030569 

0.0482590 

2468  0.0521181 

134095  0.0397774 

75137 

0.0513159 

***  AERULE 

kk k 

L CAE)  R=*[ UH3  AE 
L CAE]*(UH3  AE] 
CAE]  R=(AEI  EH3] 
CAEMAE] 

EH3J 

0 0.0000000 
129  0.0027242 
22  0.0004646 
1554  0.0328167 

0 0.0000000 
5700  0.0016908 
841  0.0002495 
137131  0.0406780 

0 

2961 

544 

43298 

0.0000000 

0.0020223 

0.0003715 

0.0295710 

1705  0.0360054 

143672  0.042618 2 

46803 

0.0319648 

***  AARULE 

kirk 

C AA  3 * £ AH  ] 

1268  0,0267770 

86985  0.0258029 

35471 

0.0242254 

1268  0.0267770 

86985  0.0258029 

35471 

0.0242254 

£v> 

3“ 

f.’-r*' 


1 
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Table  8 (continued) 

STAT  Results  for  the  First  8000  Words  of  the  Brown  Corpus  and  the 
IPA-to-Votrax  Phase  of  the  Translation  by  Version  3 of  the  Rules 


No.  of 

Total  Frequencies 

Total  No.  of  Texts 

Rule 

Words  Matched 

of  Words  Matched 

for  Words  Matched 

Abs. 

Relative 

Abs. 

Relative 

Abs. 

Relative 

***  AO  RULE  *** 


L [ A01  R-IUH3  03 

21 

0.0004435 

633 

0.0001878 

317 

0.0002165 

L (AOJ  EfMUH3  AN  02] 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

L t A0]*[UH3  AN] 

24 

0.0005068 

2222 

0.0006591 

1235 

0.0008435 

IAO]  R*[QJ 

422 

0.00891 16 

36342 

0.010/803 

13460 

0.0091927 

[AO]  E1MAW  02] 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

[ AO ) * [ AN  1 

22/ 

0.0047937 

19317 

0.0057301 

9789 

0.0066855 

694 

0.0146556 

58514 

0.0173573 

24801 

0.0169382 

***  ONRULE 

★** 

L [ON ]=[ UH3  01  U! ] 

78 

0.0016472 

4100 

0.0012162 

2766 

0.0018891 

C0N]=.[Ol  Ul] 

424 

0.0089538 

35818 

0.0106249 

17358 

0.0118549 

502 

0.0106010 

39918 

0.0118411 

20124 

0.0137440 

***  UHRULE 

*** 

L (UH]*(UH3  00] 

8 

0.0001689 

1282 

0.0003803 

729 

0.0004979 

(UH]*(003 

113 

0.0023863 

12042 

0.0035721 

5245 

0.0035822 

121 

0.0025552 

13324 

0.0039524 

5974 

0.0040800 

***  UN  RULE 

★** 

[UN]=(1U  U] 

576 

0.0121637 

68818 

0.0204139 

20244 

0.0138259 

576 

0.0121637 

68818 

0.0204139 

20244 

0.0138259 

***  E RRULE 

IY  [ER]*(I3  ER] 

14 

0.0002956 

551 

0.0001634 

342 

0.0002336 

ER  [ ER]*( IU  R] 

5 

0.0001056 

115 

0.0000341 

56 

0.0000382 

L [ ER]»( UH3  ER] 

45 

0..  0009503 

1730 

0.0005132 

1141 

0.0007793 

[ER3  L=*[ UH3  ER] 

21 

0.0004435 

2081 

0.0006173 

1065 

0.0007274 

R ( ER]*( UH3  R] 

9 

0.000t90i 

210 

0.0000623 

152 

0.0001038 

[ ER  3 = t E R] 

1090 

0.0230181 

66563 

0.0197450 

35008 

0.0239092 

1184 

0.0250032 

71250 

0.021 1353 

37764 

0.0257915 

***  AXRULE 

IAX]=[UH2] 

1195 

0.0252355 

183722 

0.0544985 

32803 

0.0224033 

1195 

0.0252355 

1 83722 

0.0544985 

32803 

0.0224033 

36 
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Table  8 (continued) 

STAT  Results  for  the  First  8000  Words  of  the  Brown  Corpus  and  the 
IPA-to-Votrax  Phase  of  the  Translation  by  Version  3 of  the  Rules 


No.  of 

Total  Frequencies 

Total  No.  of  Texts 

Rule 

Words  Matched 

of  Words  Matched 

for  Words  Matched 

Abs.  j Relative 
1 

Abs. 

Relative 

Abs. 

Relative 

***  AH RULE  *** 

[AH1-CUHJ 

725  0.0153102 

51178 

0.0151812 

24431 

0.0166855 

72  5 0.0153102 

51178 

0.0151812 

24431 

0,0166855 

***  AY  RULE  *** 

[AY]  L-CAH  AY] 

32  0.0006758 

2564 

0.0007606 

1345  0.0009186 

[AY]  R=*(AH  13) 

60  0.0012671 

2188 

0.0006490 

1417  0.0009678 

[AY]  ER*[AH  AYJ 

1 0.0000211 

160 

0.0000475 

88  0.0000601 

(AYJ-IAH  El] 

396  0.0083625 

39495 

0.01 171 56 

16799  0.011 47j1 

489  0.0103265 

44407 

0.0131727 

19649  0.0134196 

***  AWRULE 
l AW]*[ AH  ON 

**★ 

151 

0.0031887 

17619  0.0052264 

7661 

0.0052322 

i 5« 

0,0031887 

17619  0.0052264 

7661 

0.0052322 

***  OY  RULE 

*** 

L tOY]  ER=»£UH3  01 

AYJ 

2 

0.0000422 

32 

0.0000095 

23 

0.0000157 

L [OY]  L*[UH3  01 

AY] 

0 

0.0000000 

0 

0. 0000000 

0 

0.0000000 

L [OY]  R«(UH3  01 

EH2J 

0 

0.0000000 

0 

0.0000000 

0 

0.0000000 

[OY]  EfWOI  AY1 

0 

0.0000000 

0 

0..0000000 

0 

0.0000000 

[OY]  L=»(0I  AY] 

9 

0.0001901 

244 

0.0000724 

123 

0.0000840 

[OY]  R-IOI  EH2I 

0 

0.0000000 

0 

0. 0000000 

0 

0.0000000 

(OYWOI  EIJ 

59 

0.0012459 

2764 

0.00081 99 

1769 

0.0012082 

70  0.0014782  3040  0.0009018  1915  0.0013079 


***  YRULE 
[Y]»(YI ] 


275  0,0058073 
275  0.0058073 


20191  0.0059894 
20191  0.0059894 


9198  0.0062819 
9198  0.0062819 


***  PRULE  *** 


tP]*[P] 


1575  0.0332601 
1575  0.0332601 


72857  0.0216120 
72857  0.0216120 


44829  0.0306166 
44829  0.0306166 


***  BRULE  *** 


[BJ=CB] 


826  0.0174431  590t0  0.0175045  25132  0.0171643 


826  0.0174431  59010  0.0175045  25132  0.0171643 
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Table  8 (continued) 

STAT  Results  for  the  First  8000  Words  of  the  Brown  Corpus  and  the 


No.  of 

Total  Frequencies 

Total  No.  of  Texts 

Rule 

Words  Matched 

of  Words  Matched 

for  Words  Matched 

Abs. 

Relative 

Abs. 

Relative 

Abs. 

Relative 

•kirk  TRULE  kirk 


m«CT)  3468  0.0732356  242828  0.0720315  108215  0.0739070 

3468  0.0732356  242828  0.0720315  108215  0.0739070 


***  DRULE  *** 

CD]=[D]  2179  0.0460151  150389  0.0446108  70118  0.0478881 

2179  0.0460151  150389  0.0446108  70118  0,0478881 


kkk  KRULE  *** 


[Kl-CK] 


2174  0.0459095  101571  0.0301296  59818  0.0408536 

2174  0.0459095  101571  0.0301296  59818  0.0408536 


*** 

GRULE 

*** 

C G1  =»  CG  3 

459 

0.0096929 

24315 

0.0072127 

13679 

0.0093423 

459 

0.0096929 

24315 

0.0072127 

13679 

0.0093423 

icirk 

FRULE 

*** 

• 

CF)«CF] 

853 

0.01801 33 

64428 

0.0191 116 

30329 

0.0207136 

853 

0.0180133 

644  28 

0.019  1116 

30329 

0.0207136 

•kirk 

VRULE 

kkk 

< 

V 

< 

690 

0.0145711 

75264 

0.0223260 

22479 

0.0153524 

690 

0,01 457 1 1 

75264 

0.0223260 

22479 

0.0153524 

***  THRULE  *** 

tTH]=£TH)  194  0.0040968 

194  0.0040968 

**★  DHRULE  *** 

CDHMTHVJ  66  0.0013933 

66  0.0013938 


21426  0.0063557  8341  0.0056966 

21426  0.0063557  8341  0.0056966 

109582  0.0325059  9026  0.0061644 

109582  0.0325059 


9026  0.0061644 


r;-?1 


.-V-.Vi'. 


ir.'.-i.' 
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Table  8 (continued) 

STAT  Results  for  the  First  8000  Words  of  the  Brown  Corpus  and  the 
IPA-to-Votrax  Phase  of  the  Translation  by  Version  3 of  the  Rules 


No.  of 

Total  Frequencies 

Total  No.  of  Texts 

Rule 

Words  Matched 

of  Words  Matched 

for  Words  Matched 

Abs.  Relative 

Abs. 

Relative 

Abs. 

Relative 

***  SRIILE  *** 

CS]=CSJ  2927  0.06181 !0  156066  0.0462948  87361  0.0596645 


2927  0.0618110  156066  0.0462948  87361  0.0596645 


***  ZRULE  *** 

tZl=CZl  1446  0.0305360  100188  0.0297193  39797  0.0271800 

1446  0.0305360  100188  0.0297193  39797  0.0271800 


***  SHRULE  *** 


K; 

[SHJ-ISM 

646  0.0136419 

29706 

0.0065 i iy 

i 6224 

0.0110804 

W 

646  0.0136419 

29706 

0.0088119 

16224 

0.0110804 

i 

***  ZHRULE 

■ 

L ", 

[ZHWZH1 

39  0,0008236 

1864 

0.0005529 

1222 

0.0008346 

39  0.0008236 

1864 

0.0005529 

1222 

0.0008346 

fc 

m 

***  HHRULE 

★ * * 

s 

\\  ' 

319  0.0067365 

55364 

0.0164229 

14106 

0.0096339 

v* 

319  0.0067365 

55364 

0.0164229 

14106 

0.0096339 

>>N 

V’ 

***  CHRUIE 
[ CH J =■  f T CHI 

★ ★★ 

297  0.0062719 

20318 

0.0060270 

9481 

0.0064752 

C", 

S.. 

297  0.0062719 

20318 

0.0060270 

9481 

0.0064752 

‘•V 

S "" 

***  JHRULE 

[JH1=[0  J] 

406  0.0085737 

1 7746 

0.0052641 

10038 

0.0068556 

r> 

406  0.0085737 

17746 

0.0052641 

10038 

0.0068556 

\ * 
•v. 

>.y. 

***  MRULE 

•kirk 

a * * 
• . 

c*u=m 

1520  0.0320987 

99240 

0.0294381 

47574 

0.0324914 

1520  0.0320987 

99240 

0.0294381 

47574 

0.0324914 

py 

r.,-' 

39 

.v*v 

Y % 
& 

iSL 

T®~ 

A ► h. 

1 4 * 

W,  ....... 

“»  % *«  yv  ' jk"**  * * *4 * > * k.  * wn  i* 

» • , * 4 y * h h,*.  K/fVl*  v I “ Vfc  ' ’L*  O V»  • *.  VVV.1*  *. 

/•.‘V’/  /\*yy>vv 
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Table  8 (continued) 

STAT  Results  for  the  First  8000  Words  of  the  Brown  Corpus  and  the 
IPA-to-Votrax  Phase  of  the  Translation  by  Version  3 of  the  Rules 


Rule 

No.  of 

Words  Matched 

Total  Frequencies 
of  Words  Matched 

Total  No.  of  Texts 
for  Words  Matched 

Abs. 

Relative 

Abs. 

Relative 

Abs. 

Relative 

★**  N RULE  *** 


[N]*CN] 

3403 

0.0718630 

250800 

0.0743962 

1 03930 

0.0709805 

3403 

0.0718630 

250800 

0.0743962 

103930 

0.0709805 

***  NX  RULE 

*** 

INX1-CNG] 

617 

0.0130295 

28255 

0.0083814 

18773 

0.0128213 

617 

0.0130295 

28255 

0.0083814 

18773 

0.0128213 

***  LRULE 

*** 

IY  CL]=f 13  LI 

87 

0.0018372 

4272 

0.0012672 

2757 

0.0018829 

EY  ILMI3  LI 

50 

0.0010559 

1864 

0.0005529 

1118 

0.0007636 

AY  ILWI3  L) 

32 

0.0006758 

2564 

0.0007606 

1345 

0.0009186 

OY  £L]*( 13  LJ 

9 

0.0001901 

244 

0.0000724 

123 

0.0000840 

AE  CL]=»tUH3  L] 

65 

0.0013726 

2009 

0.0005959 

1129 

0.0007711 

AO  tL]»tUH3  L] 

115 

0.0024285 

10474 

0.0031070 

4607 

0.0031464 

OW  (L1-IUH3  L] 

55 

0.0011615 

3448 

0.0010228 

1977 

0.0013502 

ILl-tLl 

2033 

0.0429320 

103741 

0.0307733 

60518 

0.0413317 

2446 

0.0516535 

128616 

0.0381521 

73574 

0.0502485 

***  WRULE 

*** 

IW]*[W] 

393 

0.0082992 

55863 

0.0165710 

17447 

0.0119157 

393 

0.0082992 

55863 

0.0165710 

17447 

0.0119157 

***  WHRULE 

irkir 

CWH]»[H  Ml 

30 

0.0006335 

11341 

0.0033641 

3268 

0.0022319 

30 

0.0006335 

11341 

0,0033641 

3268 

0.0022319 
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Table  8 (continued) 

STAT  Results  for  the  First  8000  Words  of  the  Brown  Corpus  and  the 
IPA-to-Votrax  Phase  of  the  Translation  by  Version  3 of  the  Rules 


No.  of 

Total  Frequencies 

Total  No.  of  Texts 

Rule 

Words  Matched 

of  Words  Matched 

for  Words  Matched 

Abs. 

Relative 

Abs. 

Relative 

Abs. 

Relative 

***  RRIILE  *** 


[R]  L«tUH3  R1 
[R]«[RJ 


26  0.000549! 
2723  0.057503! 


1183  0.0003509 
161348  0.0478616 


828  0.0005655 
8132!  0.0555394 


2749  0.0580521  162531  0.048212b  82149  0.0561049 


xamsssat  auaxmasr  aimu  xssaaxssa  xasxxxa  =ss: 

47354  1.0  3371138  1.0  1464204  1.0 
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DISCUSSION  AND  CONCLUSIONS 

Our  results  demonstrate  that  a simple  algorithm  driven  by  a small  set  of  letter-to-sound 
rules  (fewer  than  350)  can  produce  correct  IPA  transcriptions  of  the  great  majority  of 
English  words  without  using  a large  pronouncing  dictionary;  with  the  same  algorithm,  driven 
by  a smaller  set  of  rules,  the  IPA  transcription  can  be  translated  into  a form  acceptable  to 
a commercial  speech  synthesizer.  Of  the  thousand  most  frequent  words  in  English,  the 
process  mispronounces  fewer  than  4%  if  words  are  counted  according  to  their  frequency 
of  occurrence.  The  error  rate  rises  with  decreasing  word  frequency.  However,  since  over 
2/3  of  the  words  in  a typical  sample  are  among  the  most  frequent  thousand,  the  program’s 
relatively  poor  performance  on  rare  words  does  not  drive  the  overall  error  rate  higher  than 
about  10%.  Thus  on  the  average  the  program  mispronounces  fewer  than  two  words  per 
sentence  of  ordinary  written  English.  Most  of  the  mispronounciations  are  single-phoneme 
errors  and  are  easily  correctable  from  context. 

It  has  proved  to  be  quite  easy  to  modify  the  rules  and  experiment  with  different 
versions.  As  a result  we  have  been  able  in.  passing  from  version  1 to  version  3 to  reduce  the 
error  rate  for  the  1000  most  frequent  words  from  an  initial  32%  to  the  present  4%  while 
increasing  the  number  of  rules  by  three  quarters. 

We  were  at  first  slightly  disappointed  and  more  than  slightly  puzzled  by  the  discrepancy 
between  our  performance  score  of  68%  for  version  1 of  the  rules  (frequency-weighted  score 
from  Table  5)  and  Ainsworth’s  reported  scores  of  89%  to  92%  for  his  set  of  rules  [10] . 
Since  our  version  1 is  derived  from  and  quite  similar  to  Ainsworth’s  set,  we  had  expected 
similar  performance  figures. 

Three  possible  explanations  suggest  themselves.  First,  the  difference  between  British 
and  American  pronunciation  is  more  than  a simple  matter  of  dropping  or  retaining  r’s  and 
replacement  of  one  sound  by  another.  Ainsworth’s  rules,  being  adapted  to  British  English, 
might  therefore  in  various  subtle  ways  be  unamenable  to  Americanization  by  such  straight- 
forward changes  as  we  made  while  setting  up  version  1.  Second,  the  question  of  what 
pronunciations  of  a word  are  acceptable  is  by  no  means  cut  and  dried,  even  when  one  has 
a pronunciation  dictionary  at  hand.  Thus,  although  we  had  definite  criteria  in  mind  while 
scoring  translations,  we  were  not  able  to  avoid  subjectivity  entirely.  It  is  a dubious  busi- 
ness at  best  to  compare  judgments  of  correctness  arrived  at  independently  under  different 
circumstances  by  different  judges  having  different  expectations  and  different  temperaments. 
Finally,  the  samples  translated  were  different.  The  performance  of  a set  of  rules  is  sensi- 
tive to  the  vocabulary  level  of  the  material  it  is  applied  to;  Table  3 illustrates  this  clearly. 
The  Brown  Corpus  includes  selections  in  a comprehensive  range  of  styles,  and  Ainsworth’s 
descriptions,  namely,  “textbook  on  phonetics,”  “modern  novel,”  and  “newspaper  article  on 
a political  theme”  [10],  do  not  pin  down  where  in  that  range  the  sources  of  his  samples 
fall;  they  may  be  written  plainly,  or  their  authors  may  have  salted  their  language  with  rare 
words.  The  actual  reason  for  the  discrepancy  in  scores  is  probably  some  combination  of 
these  three  explanations. 

Further  additions  and  refinements  to  the  rules  could  reduce  the  error  rate  still  further. 
At  this  point  however  it  appears  that  any  improvement  in  inflection  (pitch,  stress,  and  timing) 
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would  be  more  beneficial  than  reducing  the  error  rate  by  a few  more  percent.  The  present 
system  produces  a flat  monotone  that  is  fatiguing  to  follow  for  long.  We  are  testing  some 
simple  heuristics  for  inflection.  The  listening-preference  tests  that  we  have  done  so  far  have 
led  us  to  three  conclusions:  Not  only  stress  and  pitch  but  rhythm  and  timing  are  important 
to  producing  acceptable  inflection.  Not  just  any  stress  pattern  will  do;  for  instance,  neither 
random  stress  nor  strictly  alternating  stress  were  significantly  preferred  to  the  present  mono- 
tone. One  scheme  for  adding  inflection  was  significantly  preferred  to  all  others  tested 
except  those  involving  hand-inserted  “correct”  English  stress. 

This  preferred  scheme  bases  stress  assignment  on  two  notions  beyond  the  obvious  one 
of  falling  inflection  before  a period  and  rising  inflection  before  a question  mark.  The  first 
is  that  a number  of  letter-to-sound  ru’-j,  even  in  their  present  form,  are  good  predictors  for 
stressing  and  destressing.  This  is  especially  true  of  the  rules  with  a schwa  (/a/)  on  the  right- 
hand  side,  those  for  common  function  words,  and  those  for  common  endings  like  ES  and 
ED.  The  second  is  the  tendency  in  English  speech  to  stress  approximately  alternating  syl- 
lables. Stress  is  assigned  by  a simple  algorithm  that  plays  these  two  notions  off  against  one 
another.  Unstressed  syllables  are  given  lower  pitch  and  shorter  duration  than  they  would 
have  if  stressed.  The  timing  of  the  stressed  syllables  is  adjusted  by  further  reduction  of  the 
duration  of  adjacent  unstressed  syllables  and  by  lengthening  any  adjacent  stressed  syllables. 

Our  next  step  will  be  to  test  the  inflection  schemes  using  comprehension  measures  instead 
of  listening  preference.  Present  results  indicate  that  simple  intonation  rules  can  make  the 
output  of  a text-to-speech  program  easier  to  listen  to.  This  is  important  in  delaying  fatigue 
and  boredom  in  listening  to  machine  speech  for  long  periods.  The  comprehension  tests  are 
necessary,  since  naturalness  and  intelligibility  do  not  always  go  together,  and  one  is  sometimes 
attained  at  the  expense  of  the  other.  If  the  inflection  scheme  described  results  in  increased 
intelligibility  as  well  as  increased  naturalness,  this  will  be  an  important  advance. 

Another  task  already  in  progress  involves  tailoring  a version  of  the  rules  for  a special 
application.  We  are  putting  together  a data  base  of  words  pertinent  to  subjects  of  interest 
to  the  Navy.  We  intend  to  test  the  rules  by  translating  these  and,  if  it  is  necessary,  retune 
the  rules.  It  remains  to  be  seen  whether  the  statistics  of  the  ordinary  words  in  Navy-oriented 
English  will  require  much  reworking  of  the  rules;  however  acronyms  must  certainly  be  dealt 
with.  The  pronounciations  people  give  to  unpronounceable  combinations  like  WWMCCS 
(/wlmlks/)  are  too  arbitrary  for  any  systematic  procedure  to  have  much  hope  of  duplicating 
them.  A more  reasonable  goal  is  to  pronounce  pronounceable  combinations  plausibly  and 
spell  out  unpronounceable  ones.  One  simple  expedient  is  to  pronounce  each  consonant  as 
its  name  when  the  context  is  an  isolated  cluster  consisting  entirely  of  consonants.  This 
already  catches  a good  number  of  important  acronyms  and  abbreviations  (e.g.,  NRL),  and 
the  idea  could  be  pushed  further. 
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Appendix  A < 

PROGRAM  DOCUMENTATION  FOR  TRANS 

The  translation  program  was  designed  to  make  experimentation  with  the  letter-to-sound 
rules  simple.  This  resulted  in  a program  that,  once  written  and  debugged,  required  a mini- 
mum of  changing  when  the  rules  were  altered. 

The  program  starts  by  asking  the  user  for  input  and  output  names:  either  the  name 
of  a text  file  or  ‘TTY’,  meaning  the  tfc.minal;  for  output  ‘CAS’,  meaning  the  cassette  unit, 
is  also  allowed.  It  then  asks  about  various  output  options,  including  whether  a stat  file  is 
wanted  and  what  translation  is  wanted.  Table  A1  indicates  the  possible  translations.  English  is 
arbitrary  English  text,  IPA  is  text  in  the  representation  of  the  International  Phonetic  Alpha- 
bet given  in  Table  1,  Votrax  text  consists  of  the  mnemonic  names  of  Votrax  synthesizer 
codes,  and  ASCII  is  a representation  of  Votrax  codes  by  pairs  of  ASCII  characters  that  is 
sometimes  used  in  transmitting  to  a Votrax  over  serial  ASCII  communication  channels. 


Table  A1 
Legal  Translations 


Input  String 

Possible  Output  Strings 

■ English  . 
IPA 
Votrax 
ASCII 

English,  IPA,  Votrax,  or  ASCII 
IPA‘,  Votrax,  or  ASCII 
Votrax  or  ASCII 
ASCII 

After  questioning  the  user,  the  program  expects  a string  of  symbols  terminated  by  an 
end-of-text  marker  *#\  If  it  receives  one,  it  translates  the  string,  produces  the  requested 
outputs,  and  looks  for  another  such  string.  It  keeps  translating  strings  until  it  comes  to 
the  end  of  the  input  file  or  encounters  *###’  in  an  input  string;  at  that  point  the  used  may 
choose  whether  to  quit  or  to  start  over,  respecifying  file  names.  The  translation  consists  of 
any  of  the  following  three  passes  that  may  be  needed,  applied  in  order:  English  to  IPA, 
IPA  to  Votrax,  and  Votrax  to  ASCII.  Figure  A1  shows  a sample  dialog. 

The  program  consists  of  four  major  sections: 

• the  rules, 

• the  function-name  declarations  and  initialization, 

• the  translation  routines,  and 

• the  service  routines. 


These  will  now  be  described  in  detail. 
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START  OF  PROGRAM  — TRANS.  LAST  UPDATE  APRIL  8,  1975 
NHAT  IS  THE  INPUT  FILE  NAME? 

TTY 

WHAT  IS  THE  FILENAME  FOR  THE  TRANSLATION  RESULTS? 

TTY 

DO  YOU  WANT  TO  GATHER  STATISTICS? 

N 


WHAT  TRANSLATION  DO  YOU  WANT? 

E I 

TRANSLATION  OF  ENGLISH  TO  IPA  BEGINNING 
ENTER  TEXT  TERMINATED  BY  A # 

THE  TIME  HAS  COME,  THE  WALRUS  SAID, 

TO  TALK  OF  MANY  THINGS— 

OF  SHOES,  AND  SHIPS.  AND  SEALING  WAX, 

OF  CABBAGES  AND  KINGS,# 

IPA  RESULT  IS 

/DH  AX//<  >//T//AY//M//  //<  >//HH//AE//Z//<  >//K  AH  M//  //<  >//<,>//<  > 
//DH  AX//<  >//N//AO  L//R//AH//S//<  >//S  EH  D//<  >//<,»//<  >//T  UW//<  >// 
T//AO  K//<  >//AX  V//<  >//M//EH  N IY//<  >//TH//IH//NX//Z//<  >//<->//<->//<  > 
//AX  V//<  >//SH//ON//  //Z//<  >//<,>//<  »//AE//N//D//<  >//SH//IH//P//S//< 
»//<,>//<  >//AE//N//D//<  >//S//IY//L//IH//NX//<  >//W//AE//K  S//<  >//<,>//< 
>//AX  V//<  >//K//AE//B//B//IH  JH//IH  Z//<  >//AE//N//D//<  >//K//IH//NX// 
Z//<  >//<,>//<  >/ 

ENTER  TEXT  TERMINATED  BY  A # 

AND  WHY  THE  SEA  IS  BOILING  HOT, 

AND  WHEIhEN  PIUS  HAVE  WINGS.  # 

IPA  RESULT  IS 

/AE//N//D//<  >//NH//AY//<  >//OH  AX//<  >//S//IY//<  >//IH//Z//<  »//B//OY/ 
/L//IH//NX//<  »//HH//AA//T//«  »//<,>//<  >//AE//N//D//<  »//WH//EH//DH  ER/ 

/<  »//P//IH//G//Z//<  »//HH  AE  V//  //<  >//W//IH//NX//Z//<  >//<•>//<  >/ 

ENTER  TEXT  TERMINATED  BY  A # 

### 

EOF  ENCOUNTERED  IN  INPUT  FILE 
DO  YOU  WISH  TO  CONTINUE? 

Y 

WHAT  IS  THE  INPUT  FILE  NAME? 

TH 

WHAT  IS  THE  FILENAME  FOR  THE  TRANSLATION  RESULTS? 

TTY 

DO  YOU  WANT  TO  GATHER  STATISTICS? 

N 


WHAT  TRANSLATION  DO  YOU  WANT? 

E V 

TRANSLATION  OF  ENGLISH  TO  VOTRAX  BEGINNING 
ENTER  TEXT  TERMINATED  BY  A # 

AND  WHY  THE  SEA  IS  BOILING  HOT, 

AND  WHtlHER  PIGS  HAVE  WINGS.  # 

.VOTRAX  RESULT  IS 

lAEHNlIDltPAOIlH  WJ  (AH  El  ] IPA0)(THV)(UH2)tPA0)tS)IE)[PA0](  IKZ)  (PAOHBK 
01  AY  Jl  13  L]  1 1 HNGKPAOIIHII AHIITIIPAOIIPAI  HPAOHAEHNIIDHPAOl  (H  WltEH 
HTHVHERKPAOH  P)(  I IIGKZII PAOIIHII  AEUVIIPAOJIWl  1 1 IINGJIZIIPAOH  PAI  PAI  It 
PAG) 

ENTER  TEXT  TERMINATED  DY  A # 

### 


EOF  ENCOUNTERED  IN  INPUT  FILE 
DO  YOU  WISH  TO  CONTINUE? 

WHAT  IS  THE  INPUT  FILE  NAME? 

TTY 

WHAT  IS  THE  FILENAME  FOR  THE  TRANSLATION  RESULTS? 
IPAOUT 

TEXT  TO  FILE,  TOO? 

N 

DO  YOU  WANT  TO  GATHER  STATISTICS? 

Y 

WHAT  IS  THE  FILENAME? 

TTY 

^WHAT  TRANSLATION  DO  YOU  WANT? 

TRANSLATION  OF  ENGLISH  TO  IPA  BEGINNING 
ENTER  TEXT  TERMINATED  BY  A # 

SUICIDE# 

**  SUICIDE  # 

(S)«/S/ 

«(U)*/UM/ 

(ir+«#./iii/ 

iciws/ 

1 1 1DX-/AY/ 

(DI-/D/ 

# • t E 1 */  / 

I I»/<  >/ 

ENTER  TEXT  TERMINATED  BY  A # 

### 

EOF  ENCOUNTERED  IN  INPUT  FILE 
DO  YOU  WISH  TO  CONTINUE? 

N 

ALL  DONE 


Fig.  A1  — Sample  dialog  with  TRANS 
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Rules 

The  rules  section  contains  three  groups  of  translation  rules:  English  to  IPA,  IPA  to 
Votrax,  and  Votrax  to  ASCII.  The  English-to-IPA  and  IPA-to-Votrax  rules  have  the  form 

A[B]C=D, 

where  A,  B,  C,  and  D are  character  strings  and  B is  nonempty.  The  interpretation  is  that 
in  the  left  and  right  contexts  specified  by  A and  C,  the  string  B is  to  be  translated  to  D. 
The  English-to-IPA  part  of  the  rules  section  initializes  variables  ARULE.ENG,  BRULE.ENG, 
. . . , ZRULE.ENG,  assigning  to  each  a string  of  the  form 

rule\rule\. . .\rule\. 

The  string  assigned  to  ARULE.ENG  contains,  in  their  proper  order,  all  the  A rules: 
the  rules  where  A is  the  first  character  in  brackets.  The  other  letters  of  the  alphabet  are  handled 
similarly,  and  there  are  in  addition  variables  NUMBERRULE.ENG  and  PUNCTRULE.ENG 
that  contain  the  rules  for  translating  digits  and  punctuation  marks.  The  IPA-to-Votrax  part 
initializes  variables  IYRULE.IPA,  . . . , RRULE.IPA,  and  PUNCTRULE.IPA  in  the  same  way. 

Changing  the  translation  rules  requires  no  program  changes  except  revising  the  rule 
text  as  it  appears  explicitly  in  the  rules  section.  Since  the  English-to-IPA  and  IPA-to-Votrax 
rules  are  needed  by  programs  other  than  TRANS,  these  parts  of  the  rules  section  are  kept 
in  a separate  file  and  combined  with  the  rest  of  TRANS  (or  another  program)  for  compila- 
tion. 


The  Votrax-to-ASCII  rules  vary  from  the  format  described.  There  is  a one-to-one 
correspondence  between  the  Votrax  codes  and  their  ASCII  representations;  consequently 
the  rules  are  not  context  sensitive  as  are  the  other  rules.  Therefore  each  Votrax  code 
names  a rule  consisting  only  of  the  ASCII  pair  corresponding  to  the  code.  For  instance 
the  variable  AE1.CODE  becomes  ‘OJ\  since  OJ  is  the  ASCII  representation  for  the  Votrax 
code  AE1. 


Function-name  Declarations  and  Initialization 

After  the  rules  section,  the  program  listing  has  function  declarations  and  initialization 
of  some  often-used  patterns  and  other  variables.  The  function  declarations  define  function 
names  and  formal  parameters.  The  code  for  each  function  is  in  the  program  body,  with 
the  function  name  labeling  the  first  statement. 


Translation  Routines 

The  translation  routines  for  English  to  IPA  and  IPA  to  Votrax  are  TRANSLATETEXT 
and  VOTRAXTRANSLATE,  both  of  which  call  on  the  routine  TRANSLATE  to  do  most 
of  the  work. 

TRANSLATETEXT  starts  with  a pointer  I at  position  1 of  the  input  string  and  places 
the  character  pointed  to  in  a variable  CHAR.  If  the  character  is  not  the  end-of-text  marker, 
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TRANSLATE  is  invoked  with  three  parameters:  the  input  text,  CHAR,  and  an  indication 
‘ENG’  of  which  set  of  rules  to  use.  TRANSLATE  determines  the  rule  that  matches  and 
returns  the  translation  result  as  a value  to  be  concatenated  with  any  previous  results. 
TRANSLATE  also  sets  a variable  INCVALUE  to  the  length  of  the  bracketed  substring  in 
the  rule.  TRANSLATETEXT  uses  this  information  to  set  the  pointer  to  the  next  character 
that  must  be  translated.  The  character  is  placed  in  CHAR,  and  the  process  repeats.  On 
encountering  the  end-of-text  marker,  the  routine  returns  to  the  calling  program. 

VOTRAXTRANSLATE  works  much  like  TRANSLATETEXT.  Minor  differences  in  the 
details  of  the  scan  are  due  to  differences  in  the  format  of  the  input:  the  IPA  symbols  are 
represented  by  one-  and  two-character  combinations  delimited  by  blanks  or  slashes,  and  the 
symbols  of  English  text  are  represented  as  single  characters  without  delimiters.  TRANS- 
LATE is  given  ‘I?A’,  rather  than  ‘ENG’,  as  the  indication  of  which  rules  to  use. 


TRANSLATE  takes  three  arguments:  the  text  being  translated  (BUF),  the  character 
or  symbol  currently  being  scanned  (GRAPHEME),  and  an  indicator  of  the  set  of  rules  to 
use  (QUAL).  The  routine  replaces  GRAPHEME  by  ‘PUNCT’  or  ‘NUMBER’  if  it  is  a punctu- 
ation symbol  or  a digit,  then  builds  from  GRAPHEME  and  QUAL  the  name  of  a string  of 
rules  to  search  for  a match.  For  instance,  if  the  last  two  arguments  were  ‘A’  and  ‘ENG’, 
it  would  use  ARULE.ENG,  which  contains  the  A rules  for  English-to-IPA  translation.  The 
routine  then  sequentially  breaks  off  rules  (substrings  delimited  by  ‘V)  from  the  rule  string 
until  it  either  runs  out  of  rules  or  finds  a rule  whose  left-hand  side  matches  the  text  at  the 
current  position.  In  the  first  case  it  gives  an  error  message;  in  the  second  case  it  returns  the 
right-hand  side  of  the  rule  as  a function  value  and  puts  the  length  of  the  bracketed  part  of 
the  left-hand  side  in  INCVALUE  to  indicate  how  many  spaces  the  pointer  shouid  be  moved 
before  resuming  the  scan. 

As  each  rule  is  broken  from  the  rule  string,  it  is  in  turn  broken  into  four  pieces  called 
BACKCHAR,  CHARDEF,  FORCHAR,  and  PHONEME.  These  pieces  correspond  to  A,  B, 

C,  and  D in  the  notation  where  A[B]C=D  is  the  form  of  a rule.  From  the  first  three  is 
built  a SNOBOL  pattern  that  tests  whether  the  left-hand  side  of  the  rule  matches  the  text 
at  the  appropriate  position.  Both  BACKCHAR  and  FORCHAR  are  examined  for  special 
symbols.  If  none  are  found,  BACKCHAR  and  FORCHAR,  as  they  stand,  are  used  in 
building  the  pattern.  If  any  special  symbols  do  occur,  the  code  starting  at  SPECIAL- 
CASEPROC  is  executed.  This  code  builds  the  necessary  pattern  by  applying  the  function 
SPECIALBREAK  to  BACKCHAR  and  to  FORCHAR. 

SPECIALBREAK  breaks  its  argument  into  (a)  strings  free  of  special  symbols  and  (b) 
special  symbols.  These  are  concatenated  back  together  in  the  same  order  with  each  special 
symbol  replaced  by  its  corresponding  pattern.  The  routine  works  on  the  input  string  from 
left  to  right:  it  breaks  off  (a)  everything  up  to  the  first  special  symbol  and  (b)  the  first 
special  symbol;  then  it  concatenates  the  initial  string  and  the  pattern  corresponding  to  the 
symbol  onto  the  end  of  the  partial  result  (originally  null);  this  continues  until  no  special 
symbols  remain,  at  which  point  what  is  left  of  the  input  string  is  added  to  the  result  and 
the  function  returns. 


:.\y 
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The  patterns  corresponding  to  the  special  symbols  **’,  *#’,  ...  are  in  variables  whose 
names  are  ‘PATTERN*’,  ‘PATTERN#’,  . . . and  must  be  referred  to  by  writing 
$‘PATTERN*’,  $‘PATTERN#’,  . . . , since  the  names  are  not  legal  identifiers.  For  the 
introduction  of  a new  special  symbol,  say  ‘?’,  only  two  steps  are  necessary: 

1.  Add  *?’  to  the  string  of  special  symbols  in  the  variable  SPECIALCASE. 

2.  Write  the  desired  SNOBOL  pattern  and  assign  it  to  the  appropriate  variable  with  an 
assignment  statement  of  the  form 

$‘PATTERN?’  = SNOBOL  pattern. 

A third  step  is  desirable: 

3.  Update  the  comments  to  reflect  the  addition. 

The  same  changes  should  probably  be  made  in  DICT  at  the  same  time. 

ASCII  translates  a Votrax  code  string  into  the  ASCII  representation. 


Service  Routines 

The  remainder  of  the  SNOBOL  program  contains  service  routines  to  decide  the  type 
of  translation,  to  set  up  file  name  definitions,  to  input  the  text  from  a specified  file,  to 
output  translation  results  to  a specified  file,  to  gather  statistics  on  the  rules  used  in  a 
translation,  and  to  make  the  correct  sequence  of  function  calls  to  perform  the  translation 
requested. 

Initially  the  program  invokes  a routine  called  FILEDEFINE.  This  routine  asks  for  the 
input  and  output  file  names  and  sets  up  the  correct  correspondence  for  the  computer 
system.  It  also  sets  flags  to  indicate  whether  the  input  text  should  also  be  written  to  the  out- 
put file  and  whether  statistics  should  be  recorded  to  a named  file.  This  is  all  at  the  user’s 
option.  Finally  FILEDEFINE  makes  the  logical  file-name  correspondences  of  INPUTTEXT 
for  the  input  file,  STATISTICS  for  the  statistics  file,  and  TRANSTEXT  for  the  translation 
results  file. 

After  file  definitions,  the  user  must  indicate  the  type  of  translation  wanted.  The 
routine  CLI  reads  this  information,  which  may  be  in  abbreviated  form,  and  expands  the 
input  type  and  output  type  to  their  full  spellings,  placing  these  results  in  the  variables  IN 
and  OUT.  Then  the  program  branches  to  the  statement  labeled  by  the  contents  of  IN 
concatenated  with  OUT.  The  code  at  each  of  these  points  invokes  TRANSLATETEXT, 
VOTRAXTRANSLATE,  or  ASCII  with  the  appropriate  parameters  to  produce  the  trans- 
lation requested. 

FILEOUT  outputs  Votrax  code  to  a file  in  a format  compatible  for  a TI  733  cassette. 
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TRANS  Program  Listing 


**************************************************************** 

* 

* 

* Irk***  TRANS  ***** 

* 

A 

v THIS  IS  THE  TRANSLATION  PROGRAM  WHICH  INPUTS 

* ENGLISH  TEXT  AND  TRANSLATES  TO  PHONEMES. 

- IT  IS  WRITTEN  IN  SNOBOL  FOR  THE  PDPIO. 

* IT  WILL  REDEFINE  FILES  ON  ENCOUNTERING  AN  EOF 

* OR  ON  SEEING  A ###  STARTING  IN  POSITION  ! OF  THE  INPUT  STRING. 

* IT  ALSO  PROVIDES  THE  FACILITY  TO  TRANSFER  INTERMEDIATE  AND  FINAL 

* OUTPUT  RESULTS  TO  A PREDEFINED  FILE  OR  TO  THE  TTY. 

.OUTPUT  TO  A FILE  IS  IN  A FORM  COMPATIBLE  WITH  THE  SPEECH  LAB. 

V.  IF  THE  CASSETTE  IS  SPECIFIED  IT  WILL  OUTPUT  IN  A FORM  COMPATIBLE 

* TU  THE  SPEECH  LAB. 

* 

**i * *%  ******** ************  *****  ***  ***  ************************  **** 

* 


.* 

*iWk********************************************  ***********  ****** 


* 

* *****  ENGLISH  TO  IPA  TRANSLATION  RULES  ***** 

* 

* 

* IN  THESE  RULES  SOME  SPECIAL  SYMBOLS  SERVE  AS  KEYWORDS. 

» THIS  SPECIAL  CONNOTATION  HOLDS  UNLESS  THE  SYMBOL 

* APPEARS  BETWEEN  BRACKETS!  THEN  IT  DENOTES  ITSELF. 

* 

* # « ! OR  MORE  VOWELS 

* * » i OR  MORE  CONSONANTS 

* . - A VOICED  CONSONANT 

* $ * SINGLE  CONSONANT  FOLLOWED  BY  AN  'I'  OR  'E' 

* X * SUFFIX  SUCH  AS  'E','ES','ED','ER','ING','ELY' 

* & » A SIBILANT 

* # ■ A CONSONANT  AFTER  WHICH  LONG  ' U ' IS  PRONOUNCED 

* AS  IN  'RULE',  NOT 'MULE' 

* » A SINGLE  CONSONANT 

* + - A FRONT  VOWEL*  'E','I','Y' 

* « •«  0 OR  MORE  CONSONANTS 

* 

* 

**************************************************************** 
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HUNCiRULE.ENG  - 


♦ 

*C  ]'«/  /\* 

' £ - ]-/  /V 

'C  ]«/<  >/v 

♦ 

'£-]»/<->/ V 

* 

*.  I'  swz/v 

♦ 

*#«.E  C'  S J -/Z/V 

♦ 

*i  I'  S J-/Z/V 

*1'  ]»/  /V 

♦ 

'£,  J-/<f>/V 

♦ 

'£  • ]«/<.>/V 

♦ 

'C?w<?>/v 

A RULE. ENG  « 

♦ 

'CA]  -/AX/V 

♦ 

' CARE]  «/AA  R/V 

♦ 

' CAR30-/AX  R/V 

* 

'£ARJ#»/EH  R/V 

* 

' ^tAS]#»/EY  S/V 

♦ 

'£A]WA»/AX/V 

* 

'IAWJ-/AO/V 

+ 

' KANYWEH  N IY/V 

+ 

'£Ar+#»/EY/V 

* 

'#*CALLY]«/AX  L IY/V 

t 

' IAL]#-/AX  L/V 

* 

'CAGAIN]«/AX  G EH  N/V 

* 

'#*CAG]E«/IH  JH/V 

* 

'CAr+nM/AE/V 

* 

' *tAr+  «/EY/V 

* 

'£Arx»/EY/V 

' IARRJ-/AX  R/V 

'CARRWAE  R/V 

♦ 

' «£AR]  -/AA  R/V 

'CAR]  «/ER/V 

♦ 

'CAR1-/AA  R/V 

+ 

'CAIRJ-/EH  R/V 

'CAI3-/EY/V 

♦ 

'CAYJ-/EY/V 

* 

'CAUJ-/A0/V 

* 

'#*CAL]  -/AX  L/V 

* 

'#*CALS3  -/AX  L Z/V 

.♦ 

'CALK]»/AO  K/V 

if 

'CALP-/AO  L/V 

* 

' *CABLE]»/EY  B AX  L/V 

.+ 

'CABLE] -/AX  B AX  L/V  . 

♦ 

'CANGJ+-/EY  N JH/V 

+ 

'CAJ-/AE/V 

* 
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$ 


* 

* 

♦ 

♦ 

♦ 

* 

* 

* 

♦ 

♦ 

♦ 

+ 

* 

if 

* 

♦ 

♦ 

♦ 

* 

* 

♦ 

♦ 

♦ 

* 

* 
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BRULE. ENG  * 

' tBEr#»/B  IH/V 
-'[BEINGl-ZB  IY  IH  NX/V 

* [BOTH]  »/B  OH  TH/V 
-*  [BUS3#-/B  IH  Z/V 

RIJTMa/R  IH  LA-' 
•'[BJ-/BA' 

C RULE. ENG  * 

* CCHr»/K/V 
'-E[CHWK/V 
'ICHJ-/CH/V 

* SICI]#-/S  AY  A' 
-'ICI1A-/SH/V 
-'[CIJO-/SH/V 
'CCI1EK-/SK/V 
'XCJ+-/S/V 
'[CK1-/K/V 

,»t  (YIIMV-/V'  1U  UA/ 

4 vUm  j "»— » i\  r\ 4 i mr  « 

■'[CWK/V 

DRULE.ENG  » 

•'#*  £DEQ]  «/D  IH  DA' 
'.EID]  -/DA' 

^t'JEID]  -/T/V 
' [DEr#-/D-  IH/V 
■*  [DO]  */D  UH/V 

* [DOESI-/D  AH  Z/V 

' [D0ING3-/D  UH  IH  NX/V 
' [DOHJ-/D  AH/V 
'XDU1A-/JH  UH/V 
'IDJ-/0A' 


V*”  V l/* 

i « w * ^ 
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ERULE.ENG  > 

-/#*£E3  «/  /V 
**  — « CE3  */  /V 

* • C E 3 »/IY/V 
-'#£ED3  */D/V 
•'♦ifElD  */  /V 
•'£EV3ER*/EH  V/V 
'CErx«/IY/V 
-'CERI ]#=/IY  R IY/V 
'tERI  3*/£H  R IH/V 
-'#*£ER3#*/ER/V 
'£ER3#=*/EH  R/V 
'£ER3«/ER/V 

* £EVEN3«/IY  V EH  N/V 
>#«£E3W»/  /V 
•'OIEWJa/UH/V 
<IENJ*/Y  UW/V 
-'CEIO/IY/V 
-'#*&£ES3  »/IH  Z/V 
-'#* £E3S  »/  /V 
•'#t£ELY]  «/L  IY/V 
-'#i£EUENT3*/M  EH  N T/V 
'CEFULWF  UH  L/V 
-aEEWIY/V 
•/£EAR?J3-/ER  N/V 

* tEAR3WER/V 
«'£EAD3*/EH  D/V 
■*#»  IEAI  »/IY  AX/V 
' £ EA  3 SU=/EH/ V 
'IEA]-/IY/\' 
•^lEIGHJa/EY/V 
^£EI]-/IY/V 

•*  E EYE  WAY/ V 
•*£EY.]»/IY/V 
'£EU3-/Y  UW/V 
•>£E]»/EH/V 

F HJLE. ENG  - 

/£FUL3»/F  UH  l yv 
-^£F3»/F/V 
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G RULE. ENG  * 

'CGIVJa/G  IH  V/V 
' tGir*/G/V 
'CGE]T=/G  EH/V 
'SUCGGES]=/G  JH  EH  S/V 
'CGG]»/G/V 

j a*r n.ys/ei/\-f 

'CG]+=/JH/V 
'C GREAT] */G  R EY  T/V 
'#CGHJ«/  /V 
'CGWG/V 

HRULE.ENG  =* 

' CHAVWHH  AE  V/V 
' CHERE3-/HH  IY  R/V 
' CHOUR]*/AW  ERA' 
'[HOHJa/HH  AHA' 
'CH]#«/HH/V 
'CH1»/  /V 

I RULE. ENG  » 

' CINWIH  NA' 

' II]  a/AY/V 
'IINlDa/AY  N/V 
'CIERWIY  ER/V 
'#*  R[  IEDJ  a/IY  D/V 
'II ED]  «/ AY  D/V 
'I  IEN3*/IY  EH  NA' 
'CIE]T«/AY  EH/V 
- *CI]%a/AY/V 
'[I1X-/IY/V 
'CIEWIY/V 
'crr+*#a/iH/v 
'Cl R]#*/AY  R/V 
'IIZ]Xa/AY  Z/V 
'CIS]X*/AY  Z/V 
'CI]DXa/AY/V 
'+^cir+a/iH/v 
'CI ]TX*/AY/V 
'#~iCir+-/IH/V 
'Cir+a/AY/V 
'CIRWER/V 
'UGH  3 a/AY/V 
'CILD]a/AY  L D/V 
'CIGN3  a/AY  N/V 
'C IGNl^a/AY  N/V 
't IGN]%a/AY  N/V 
'CIQUE]a/IY  K/V 
'CIWIH/V 


54 


NRL  REPORT  7948 


J RULE. ENG  » 

vt  J]*/JH/V 

KRULE.ENG  - 

■*  CK3N*/  /V 
-'tK]»/K/V 

LRULE.ENG  - 

■'CLO]C#»/L  OW/V 
'LIL]-/  /V 
■'#*«!  LI  WAX  L/V 
'tLEADWL  IY  D/V 
■'CLWL/V 

M RULE. ENG  =* 

'IMOVWM  UW  V/V 
'IMWM/V 

NRULE.ENG  - 

•'EtNGJWN  JH/V 
^[NG]  ft«/NX  G/V 
VING]#»/NX  G/V 
-/CNGLl%a,/NX  G AX  L/V 
•'CNCWNX/V 
-'INKWNX  K/V 
' £NOW]  =/N  AH/V 
•/CNJ«/N/V 
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□RULE. ENG 


♦ 

MOF]  ~/AX  V/V  . 

* 

M0R0UGH3=/ER  OW/V 

♦ 

MU  COR]  */Efi/V 

<¥ 

'#*rORSl  =/ER  Z/V 

* 

M0R3-/A0  R/V 

♦ 

* C0NE3-/W  AH  N/V 

<* 

MQW3*/0W/V 

* 

•*  C0VER3-/0W  V ER/V 

* 

MQVJ*/AH  V/V 

* 

moj~x=/ow/v 

.♦ 

mo]-en=/ow/v 

* 

Mon#»/aw/v 

♦ 

M0L3D=/0H  L/V 

* 

M OUGHT 3 “/AO  T/V 

+ 

-M0UGH3*/AH  F/V 

*■ 

* C0U3-/AW/V 

*■ 

•'H[0U3S#=»/AW/V 

* 

M0US3-/AX  S/V 

.+ 

M0UR3-/A0  R/V 

* 

MOULD]*/UH  D/V 

♦ 

*'M0U1~L»/AH/V 

* 

M0UP3*/UW  P/V 

* 

■M0U3*/AW/V 

♦ 

M0Y3-/0Y/V 

+ 

MOING3*/OW  IH  NX/V 

♦ 

MOIJ-/OY/V 

♦ 

M00R3-/A0  R'V 

♦ 

■MOOK]“/UH  K/V 

+ 

MOOD3»/UH  D/V 

+ 

M00J-/UV1/V 

♦ 

mo]e*/ow/v 

+ 

-'Coj  -/aw/v 

+ 

■MOAJ»/OW/V 

* 

■>  C0NLY3=/0W  N L IY/V 

* 

J C0NCE3*/W  AH  N S/V 

♦ 

MON  * T3=*/0W  N T/V 

* 

-'CC03N«/AA/V 

♦ 

M03NG*/AC/V 

* M[OJN=/AH/V 

+ 

MtQN3*/AX  N/V 

* 

•*§t  [ON]  */AX  N/V 

* 

-'#M0N3=/AX  n/v 

♦ 

MO]ST  */OW/V 

* 

M0F3"**/A0  F/V 

+ 

* [OTHER  3»/AH  DH  ER/V 

* 

-'[OSS]  */A0  S/V 

* 

C OM  3 »/AH  M/V 

* 

* 

M03-/AA/V 
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PRULE.ENG  » 

•+  phwf/v 

+ •'CPEOPWP  IY  PA' 

•'tPOW]=/P  AW/V 

+ ■'[PUT]  -/?  UH  TA' 

♦ ■'CPJa/PA' 

* 

Q RULE. ENG  » 

♦ ^IQUARJa/K  W AO  RA' 

* 'IQU)*/K  WA' 

♦ VCQJ*/KA' 

* 


♦ 

* 

+ 

* 

+ 

♦ 

♦ 

+ 

+ 

+ 

♦ 

♦ 

♦ 

+ 

•f 

♦ 

+ 

+ 

♦ 

* 

♦ 

* 

* 

* 

+ 

* 


RRULE.ENG  » 

v CREr#*/R  IYA' 
*1  RWRA' 


S RULE. ENG  « 

•'tSHJa/SHA' 
-'#CSION]-/ZH  AX  NA' 
'CSOMEWS  AH  MA' 
-'#ISUR]#*/ZH  ERA' 
-'ISUR]#*/SH  ERA' 
-/#CSU]#*/ZH  UHA' 
•'#CSSU]#a/SH  UWA' 
-M'CSED]  »/Z  DA' 
'#IS]#-/ZA' 
•'ISAIDWS  EH  DA' 
-/',*tSIQN]=/SH  AX  NA' 
•'ISIS-/  A' 

-'.[S]  a/ZA' 

•'#«  .E(S]  a/Z/\' 
-'#**##£S1  a/ZA' 
-/SA' 

•'UtS]  a/SA' 

' «#£SJ  »/Z/\' 
CSCHWS  K/V 
•'£S]C+»/  A' 
-'#£SMJ»/Z  MA' 

»#CSN3  VZ  AX  NAM 
'(S]*/SA' 


'.V 

!.S 
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TRULE.ENG  - 


♦ 

' (THE!  =/DH  AX/V 

* 

'(TO]  =/T  UW/V 

* 

■'(THAT]  =/QH  AE  T/V 

* 

' (THIS]  =/DH  IH  S/V 

* 

' (THEYl^/DH  EY/\/ 

* 

' (THERE ]»/DH  EH  R/V 

♦ 

-/(THER3=/DH  ER/V 

♦ 

'(THEIR]=/DH  EH  R/V 

-f 

' (THAN]  -/OH  AE  N/V 

♦ 

' (THEM]  »/DH  EH  M/V 

♦ 

'(THESE!  */DH  IY  Z/V 

+ 

' (THENWDH  EH  N/V 

+ 

'(THROUGH  3 «/TH  R UW/V 

♦ 

'( THOSE ]«/DH  OW  Z/V 

+ 

'(THOUGH]  -/OH  OW/V 

♦ 

' (THUSWDH  AH  S/V 

* 

'(TH]*/TH/V 

+ 

'#* (TED]  */T  IH  D/V  ’ 

+ 

'S(TI]#N*/CH/V 

+ 

'(TIJO-/SH/V 

♦ 

'(TI]A*/SH/V 

♦ 

'(TIENWSH  AX  N/V 

♦ 

'(TUR]#=/CH  ER/V 

♦ 

'(TU3A-/CH  UH/V 

+ 

' (TWOWT  UW/V 

+ 

'(T1-/T/V 

URULE.ENG  » 

+ 

' (UN]I*/Y  UW  N/\' 

J- 

' (UN  3*/ AH  N/V 

* 

' (UPON WAX  P AO  N/V 

+ 

'0(UR]#»/UH  R/V 

* 

'(URW/Y  UH  R/V  ■ 

* 

'(URWER/V 

♦ 

'(UP  »/AH/V 

+ 

'(UrWAH/V 

* 

'(UYWAY/V 

* 

' GIU]#*/  /V 

* 

'G(U3W  /V 

* 

'G(U3#»/H/V 

+ 

'#N(U3*/Y  UW/V 

+ 

'0(UWUW/V 

* 

'(UWY  UH/V 

w 

VRULE.ENG  - 

♦ 

'(VIEWWV  Y UW/V 

+ 

* 

'(VWV/V 
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X RULE. ENG  * 

* £WEREWW  ER/V 
'CWA3S*/W  AA/V 
■'fWAJTs/W  AA/V 
-/£HHERE]»/WH  EH  R/V 
■'IHHA1  J*/HH  AA  1/V 
-£WhuLj*/HH  OW  L/V 
'IHHQ]=/HH  UH/V 
'£HH3*/WH/\/ 
V[WAR]»/W  AO  R/V 
-/£HOR]'*=»/H  ER/V 
-'IHRWR/V 
'IHWW/V 

X RULE. ENG  » 

^tXI«/K  S/V 

YRULE.ENG  * 

+ 1 YOUNG  WY  AH  NX/V 
' tYOUWY  UH/V 

* £YESWY  EH  S/V 

* tYWY/V 
^r*tCYJ  */IY/V 
•*§**  C Y J I*/IY/V 

* CY3  »/AY/V 
' *£Y1#»/AY/V 
' «£Yr+*#»/IH/V 
' «£Y]”*#*/AY/V 
■/£Y}»/IH/V 

ZRULE.ENG  » 

■'IZWZ/V 

NUMBERRULE.ENG  » 

-/£0]*/Z  IH  R GH/V 
'IIWH  AH  N/V 
'12  ]»/T  UH/V 
v 13  ]=/TH  R IY/V 
•/£4  J»/F  OH  R/V 
'£5]*/F  AY  V/V 
•'[dWS  IH  K S/V 
-'£7]*/S  EH  V AX  N/V 
•'C8WEY  T/V 
'£9]»/N  AY  N/V 
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************************************************<m  ************* 


* *****  IPA  TO  VOTRAX  TRANSLATIONS  RULES  ***** 


**************************************************************** 


I Y RULE.  IPA  « *{  IYJ*£EJV 

IHRULE.  IPA  * MIHHI1V 

EYRULE.IPA  = 'l  I EY]  R=EUH3  A1  I3JV 
-'L  EEY3-CUH3  AJ  AY3V 
'EEYJ  R»EA  I3JV 
-/£EY3=*£A  AY  IN' 

EHRULE.  IPA  « 'L  EEH3"£UH3  fcH3V 
'EEH3-EEH3V 

AERULE.IPA  • ■'l  £ AE 3 R-CUH3  AE  EH33V 
•'L  £AE3*E  UH3  AE3V 
-'tAEJ  R*C AE1  EH33V 
-/IAE3=£AE]  V 

AARULE.IPA  « ^£AA3-£AHJV 

AORULE. IPA  » yL  £AO]  R-EUH3  03  V 
■*L  IAQ]  ER«EUH3  AW  02JV 
'L  £A03*£UH3  AW3V 
-'TAOJ  R*EOJ V 
-'CAOJ  ER*£AW  023V 
•*£  A0J*£AW3  V 

£3WRULE. IPA  * 'l  £OW3»£UH3  0)  UI3V 
-'£0H3*£0I  UJ IV 

UHRULE.IPA  - 'L  £UH3*EUH3  003  V 
-'EUH3-E003V 

UHRULE.IPA  - 'IUW3-EIU  U3V 

ERRULE.  IPA  - 'IY  £ER3*£I3  ER3V 
-'ER  £ER3=>CIU  RJV 
'L  EER3»EUH3  ER3V 
'IER3  L*EUH3  ERJV 
-'R  CER3=EUH3  RJV 
'£ER3«EERJV 

AX  RULE.  IPA  =*  '£AX3*CUH23V 

AHRULE.IPA  » ■'EAHJ»CUH3V 

AYRULE.IPA  - 'EAY]  L»EAH  AY3V 


* 

■'lAY]  R-EAH  133 V 

+ 

•'fAYJ  ER-CAH  AY3V 

* 

-'CAY3-EAH  E13V 

AHRULE.IPA  « -'tAWJ-CAH  013V 


NRL  REPORT  7948 


I 


♦ 

♦ 

* 

+ 

* 

♦ 


! 

i 


* 

♦ 

* 

.+ 

.+ 

♦ 


* 

* 

* 

+ 

+ 

* 

* 


OYRULE.IPA  » 'L  COY]  ER=CUH3  01  AY]\' 
'L  COY]  L=IUH3  01  AY]\' 

-'L  COY]  R-CUH3  0!  EH21V 
'COY]  ER-COI  AY]\' 

'CQYJ  L=C01  AY] V 
'IQY1  FM01  EH2]\/ 

'COYJ-COI  El]\' 

YRULE.IPA  » ■/CY]*£YI  J\' 

PRULE.IPA  =*  J'CP]=tP]V' 

BRULE.  I PA  * '[B3»CB]\' 

TRULE.IPA  * 'tT3*[T]\' 

D RULE. I PA  » 'CD]a£D]\' 

KRULE.IPA  * 'CK]»CK]V 
G RULE.  I PA  * 'CG3-IG3V 
F RULE.  I PA  = 'CF]*CF]V 
YRULE.IPA  « 'tV]a[V]\' 

THRULE.IPA  =»  'CTH]a[TH]\' 

DHRULE.IPA  » 'CDH]«[THV]V 
SRULE.IPA  » '£S]»IS]\' 

ZRULE.IPA  » 'CZ]a[Z]V 
SHRULE.IPA  * '(SK]«CSH]V 
ZHRULE.  I PA  - 'CZR]a[ZH3V 
HHRULE.  I PA  » 'IHHJatHJV 
CHRULE.IPA  = 'CCH]a(T  CH3V 
JHRULE.IPA  « 'CJH]a[D  J]V 
MRULE.IPA  » 'CM]a[M]V 
NRULE.IPA  » 'CN]»CN]\' 

NXRULE.IPA  » -'[NX]aCNG3V 
LRULE.IPA  * •'lY  CL]*C  13  LJV 
'EY  CL]a[i3  L3V 
'AY  CL]a[I3  L3V 
'OY  £L]»CI3  L3\' 

'AE  CL3*CUH3  L3V 
'AO  ILJ-CUH3  LJV 
'OH  CL]*CUH3  L3\' 

'CLJ*tL]\' 

HRULE.IPA  a ^[W]a[H]\' 

HHRULE.  IPA  - ■'CHHJ-CH  WJV 
RRULE.IPA  - 'CR]  L»[UH3  R3\' 

'I  R]«IR]\' 

PUNCTRULE . I PA  » ✓ !<  >]«[PAO]\' 
'[<«>]*CPA1 ]\' 

'{<.>]«(PAI  PA13V 
'C<?>]aCPAI  PAI  J\' 
'X<->J-IPAUV 
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**************  ************  ************  ************  ***********  *** 
* 

* 

* *****  VOTRAX  TO  ASCII  TRANSLATION  RULES  ***** 

* 

* 

**************************************************************** 

* 

* 

BLANK. CODE  * 

P AO. CODE  = 'CH' 

PA! .CODE  » 'NK' 

* 

A . CODE  » '«J' 

A! .CODE  = 'FH' 

A2.C0DE  * 'EH' 

AE.CODE  * 'NJ' 

AEI .CODE  = 'OJ' 

AH.  CODE  = 'DJ' 

AH! .CODE  ■ 'El' 

AH2.CODE  - 'HH' 

AH. CODE  = 'MK' 

AH! .CODE  » 'Cl' 

AH2.C0DE  ■ 'flK' 

AY. CODE  * 'AJ' 

* 

B. CODE  * •'NH' 

* 

CH.CODE  » •'#!■' 

* 

D. CODE  » 'NI' 

DT.CODE  * 'DH' 

* 

E. CODE  * 'LJ' 

El. CODE  * 'LK' 

EH. CODE  = 'KK' 

EH! .CODE  ® 'BH' 

EH2.CODE  * 'AH' 

EH3.C0DE  - 'ttH' 

ER.CODE  * 'JK' 

.* 

F. CODE  * 'MI' 

* 

G. CODE  « 'LI' 

* 

H. CODE  » 'KI' 

* 

I.  CODE  » 'GJ' 

IJ.CODE  » 'KH' 

12. CODE  * 'JH' 
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I 3. CODE  * 'IH' 
IU. CODE  * 'FK' 

J. CODE  - 'JI' 

K. CODE  » 'II' 

L. CODE  - 'HI' 

M. CODE  » /LH/ 

N . CODE  » 'MB' 
NG.CODE  a 'DI' 

O. CODE  » 'FJ' 

01 . CODE  - 'EK' 

02.  CODE  « 'DK' 

00.  CODE  » 'GI' 

001.  CODE  * 'FI' 

P. CODE  « 'EJ' 
fl.CODE  « 'KJ' 

S. CODE  « '01' 
SH.CODE  » 'AI' 

T. CODE  - 'JJ' 
TH.CODE  a 'IK' 
THV.CODE  ■ 'HK' 

U. CODE  » 'HJ' 
UI.CODE  = 'GK' 
UH.CODE  a 'CK' 
UH1.C0DE  * 'BK' 
UH2.C0DE  » 'AK' 
UH3.C0DE  » 'CJ' 

V. CODE  » 'OH' 
(♦.CODE  » 'MJ' 

Y. CODE  » 'IJ-' 

Y I, CODE  ■ 'BJ' 

Z. CODE  » 'BI' 
ZH.CODE  » 'GH' 
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■ » * » * . « ■-  L .l  * 1 » « ■ « « i.  I.  ».  « « ■ >-  ».  »-  i t.  l »»««  ■ *»«»»»  ill.  » t.  1.  I.  . ...  - 

wiCTXV wwwWaW ^*iff  <ff  iff  ffi «nff  ffn^ff iffiff iffiff iffiff iff iffiff iffffr ^^iffiff  iff fC^c iffiff iff  . 

* 

* 

* DEFINE  FUNCTIONS  TO  BE  USED  BY  THE  PROGRAM. 

* 

* 

* SPECIALBREAK*  BREAKS  APART  SEGMENTS  OF  RULES  WHICH 

* CONTAIN  SPECIAL  CASE  SYMBOLS 

* VOWEL  OR  CONSONANT  CLASSES,  ETC. 

* 

DEFINE! 'SPECIALBREAK!STR>') 

TRANSLATETEXTi  CALLS  TRANSLATE  TO  TRANSLATE  THE  TEXT. 
PARAMETER  IS  THE  ENGLISH  TEXT. 

DEFINE!  'TRANSLATETEXT(TEXT)'} 

TRANSLATE*  BREAKS  OFF  SEGMENTS  OF  A SET 
OF  TRANSLATION  RULES  AND  DETERMINES 
WHETHER  THEY  APPLY  TO  TEXT. 

DEF INE! 'TRANSLATE ! BUF , GRAPHEME, QUAL )' ) 

VOTRAXTRANSLATE*  TRANSLATES  A STRING  OF  I PA  SYMBOLS 
TO  VOTRAX  PHONETICS  ACCORDING 
A SET  OF  PREDEFINED  RULES. 

DEFINE! -'VOTRAXTRANSLATE!  I PAPHONEMES)' > 

READTEXT*  INPUT  THE  COMPLETE  TEXT  TO  BE  TRANSLATED 

DEFINE!  "READTEXTO") 

★ 

* ASCII*  TRANSLATES  THE  VOTRAX  MNEMONIC  TO  ASCII. 

* 

DEF INE! "ASCII! STRING)') 

* 

* FILEDEFINE*  THIS  IS  THE  MODULE  WHICH  ASKS  THE  USER 

* THE  NAMES  OF  THE  INPUT  FILE  AND  RESULT  FILE 

* AND  THE  STATISTICS  FILE  AND  MAKES 

* VARIABLE  ASSIGNMENTS. 

* 

DEFINE!"FILEDEFINEO") 

* 

* FILEOUT*  THIS  ROUTINE  OUTPUTS  THE  MNEMONIC  VOTRAX 

* CODE  TO  A FILE  ASSOCIATED  WITH  TRANSTEXT. 

* 


DEF INE! "F I LEOUT! BUF ) " ) 


NRL  REPORT  7948 


O' 


* 

* 

* 

* 

* 

* 

* 

* 

★ 

★ 

* 

* 

* 


* 

* 

* 

* 

♦ 

* 

♦ 

* 

* 

* 

* 


♦ 

+ 

+ 


CLI*  INPUTS  THE  TRANSLATION  TO  BE  DONE 
BUILDS  THE  VARIA3LE  BRANCH  IN  IN  AND  OUT 

DEFINEI-'CLIO') 


MAIN  PROGRAM  CODE  STARTS  HERE 


SET  TRIM  VARIABLE  SO  TRAILING  BLANKS  ARE  AUTOMATICALLY  DELETED. 
ATRIM  * I 

IN IT  SOME  VARIABLES. 

INPUTS  INPUT-',  2 ,80) 

BLANK  » ' * 

DOUBLEBLANK  * * * 

NULL  « 

ENDTEXT  » *** 

ESCAPECODE  » ***** 

QUOTE  » 

SINGLEQUOTE  - *■** 

SPECIALCASE  - '#*.SXA8~+i-' 

ILLEGALPUNCT  » *U/\' 

PUNCTSYM8QL  * ' ,.?<»+*«$ %&-<>!  ()--'  SINGLEQUOTE 
PUNCTSYMNOBLANK  =*  ',.?H+*«$X&«-<>!  ()*  SINGLEQUOrE 
NUMBER  » ' 1 234567890' 

SET  - 'ON' 

UNSET  « 'OFF' 

DEFINE  THE  DELETE  CHARACTER  BY  USING  THE  MACHINES  ALPHABET. 

ALSO  DEFINE  RECORDON  AND  REORDOFF  AND  ENDQFMSG. 

AALPHABET 

TAB (18)  LEN(J)  . RECORDON 
TABI20)  LEN(l)  . RECORDOFF 
TABI94)  LEN(1)  . ENDOFMESSAGE 
TAB 0 27)  LEN(I)  . DELETE 

DEFINE  SOME  PATTERNS  USED  IN  THE  PROGRAM. 

THIS  HAY  SAVE  THE  BUILDING  TIME  DURING  PROGRAM  EXECUTION. 

ENDTEXTTEST  ■ ENDTEXT  RPOS(O) 

RULEBREAKPATTERN  « BREAK(-'V)  . RULE  *\* 

RULECHARSEP  =*  BREAK!' !')  . BACKCHAR  *1* 

BREAK!')')  . CHARDEF  ']* 

BREAK!'5*')  . FOBCHAR 
REM  . PHONEME 
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VOWEL  =*  'AEIOUY' 

CONSONANT  * 'BCDFGHJKLMNPGRSTVWXZ' 

VOICED  = 'BDVGJLMNRWZ' 

FRONT  * 'EIY' 

SUFI  = 'ER  ' ! 'E  ' I 'ES  ' ! 'ED  ' ! 'ING  ' 

SIBILANT  = ANY('SCGZXJ')  ! 'CH'  ! 'SH' 

NONPAL  =*  ANY('TSRDLZNJ')  ! 'TH'  ! 'CH'  ! 'SH' 

$'PATTERN#'  = ANY (VOWEL)  ARBNO ( ANY ( VOWEL ) ) 

S'PATTERN*'  = ANY (CONSONANT)  ARBNO ( ANY ( CONSONANT) ) 
S'PATTEPN.'  » ANY (VOICED) 

{'PATTERNS'  = ANY (CONSONANT)  ANY ('El') 

{'PATTERNS'  = SUFFIX 
{'PATTERN* ' * SIBILANT 
{'PATTERNS'  = NONPAL 
{'PATTERN*'  » ANT  (CONSONANT) 

$'PATTERN+'  = ANY  (FRONT) 

{'PATTEJNt'  ■»  ARBNO(  ANY  (CONSONANT)) 

TTY  * POS(O)  'TTY'  RPOS(O) 

CAS  » POS(O)  'CAS'  RPOS(O) 

NOANS  « POS(O)  ('N'  i 'NO')  RPOS(O) 

YESANS  = POS(O)  ('Y'  i 'YE'  i 'YES')  RPOS(O) 

ENGLISH  * 'ENGLISH'  i 'ENGLIS'  ! 'ENGLI'  J 
'ENGL'  ! 'ENG'  • 'EN'  ! 'E' 

IPA  ■ 'IPA'  1 'IP'  ! 'I' 

VOTRA  » 'VOTRAX'  ! 'VOTRA'  ! 'VOTR'  ! 'VOT'  ! 'VO'  ! 'V' 
ASCI  « 'ASCII'  ! 'ASCI'  ! 'ASC'  1 'AS'  ! 'A' 


OUTPUT  * ' START  OF  PROGRAM  — TRANS.' 

+ ' LAST  UPDATE  APRIL  8,  1975' 

* 

* 

* DECLARE  MAX  LENGTH  OF  STRINGS  AND  NUMBER  OF  STATEMENT 

* EXECUTIONS  SO  SNGBOL  DOESN'T  BOMB. 

* 

&STLIMIT  * l OOOOOOOO 
&MAXLNGTH  = 50000 
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★ DEFINE  FILE  INPUT  AND  OUTPUT  VARIABLES. 

★ 

8EC  FILEDEFINEO 

* 

★ CALL  ROUTINE  TO  DETERMINE  WHAT  TRANSLATIONS  ARE  TO  BE  DONE. 

* 

RECOMMAND  CLIO 

* 

* READ  IN  THE  TEXT  TO  BE  TRANSLATED. 

* AFTER  INPUTTING  TEXT  BRANCH  TO  CODE  BASED  ON  TRANSLATION. 

1 k 

INDIRECT  * IN  OUT 

OUTPUT  - * TRANSLATION  OF  y IN  * TO  v OUT  ' BEGINNING' 

* 

* INSERT  A BLANK  BEFORE  INPUT  STRING  TO  DELIMIT. FIRST  WORD. 

* 

REREED  ALLTEXT  = READTEXTO  * F (EOF  )S($  INDIRECT) 

* 

h 

* DEFINE  THE  TRANSLATIONS  WHICH  ARE  REFLEXIVE  AND  DO  NOT 

* ACTUALLY  REQUIRE  TRANSLATION-- 

* THEY  JUST  OUTPUT  THE  INPUT  TEXT. 

* 

ENGLISHENGLISH 

★*  OUTPUT  » •*  TRANS  FROM  ENG  TO  ENG  * 

t (MSG) 

IPAIPA 

**  OUTPUT  * •*  TRANS  FROM  IPA  TO  I PA  ■* 

i (MSG) 

VQTRAXVOT  RAX 

*★  OUTPUT  * •'  TRANS  FROM  VOTRAX  TO  VOTRAX  ' 

t (MSG) 

ASCI I ASCII 

**  OUTPUT  =»  ' TRANS  FROM  ASCII  TO  ASCII  ■* 

i (MSG) 

* 

MSG  TTYFLAG  SET  tSOTYOUTO) 

FILEOUT( ALLTEXT  ENDOFMESSAGE)  * (REREED) 

TTYOUTO  TRANSTEXT  » ' RESULT  IS  •*  ALLTEXT  * (REREED) 
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* DEFINE  THE  ROUTINES  WHICH  ARE  NOT  IMPLEMENTED. 

* THESE  ARE  ASCII  TO  VOTRAX 

* ASCII  TO  IPA 

* ASCII  TO  ENGLISH 

* VOTRAX  TO  IPA 

* VOTRAX  TO  ENGLISH 

•*  AND  IPA  TO  ENGLISH. 

* 

* 

IPAENGLISH  OUTPUT  « ' TRANSLATION  OF  IPA  TO  ENG  NOT  IMPLEMENTED' 

♦ * ( RECOMMAND) 

ASCI I VOTRAX  OUTPUT  - * TRANS  OF  ASCII  TO  VOTRAX  NOT  IMPLEMENTED' 

* * (RECOMMAND) 

ASCI  I IPA  OUTPUT  » ■*  TRANS  OF  ASCII  TO  IPA  NOT  IMPLEMENTED' 

•+  * (RECOMMAND) 

ASCIIENGLISH  OUTPUT  - * TRANS  OF  ASCII  TO  ENG  NOT  IMPLEMENTED' 

♦ * ( RECOMMAND) 
VOTRAXIPA  OUTPUT  * ' TRANS  OF  VOTRAX  TO  IPA  NOT  IMPLEMENTED  ■* 

+ i ( RECOMMAND) 

VOTRAXENGLISH  OUTPUT  » ' TRANS  OF  VOTRAX  TO  ENG  NOT  IMPLEMENTED' 

t (RECOMMAND) 

* 

★ 

* TO  TRANSLATE  FROM  IPA  10  VOTRAX  AND  ASCII. 

■*  TO  TRANSLATE  FROM  VOTRAX  TO  ASCII. 

* 

* 

IPAASCII 

**  OUTPUT  - ' TRANS  OF  IPA  TO  ASCII' 

* 

* REMOVE  END  OF  TEXT  MARKER. 

* 

ALLTEXT  ENDTEXTTEST  » BLANK 

★ 

* CALL  ROUTINE  TO  TRANS  FROM  IPA  TO  VOTRAX  CODES. 

* 

VOTRAXSYMBOLS  =»  VOTRAXTRANSLATE( ALLTEXT) 

* 

* CALL  ROUTINE  TO  TRANS  TO  ASCII. 

★ 

ASCI I RESULT  * ASCII (VOTRAXSYMBOLS) 

* 

★ SEE  IF  SHOULD  OUTPUT  LN  FORMAT  FOR  SPEECH  LAB. 

* 

TTY FLAG  SET  iS(TTYOUT) 

FILEOUT(ASCIIRESULT  ENDOFMESSAGE)  *( REREED) 

TTYOUT  TRANSTEXT  =*  ' ASCII  RESULT  IS  ' ASCIIRESULT  MREREED) 

* 

* 
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I PA VOTRAX 

**  OUTPUT  - ' TRANS  OF  I PA  TO  VOTRAX  ' 

* 

* REMOVE  END  TEXT  MARKER. 


ALLTEXT  ENDTEXTTEST  » BLANK 
* TRANSLATE  THE  STRING. 


VOTRAXSYMBOLS  = VOT  RAXTRANSLATE ( ALLTEXT) 


* SEE  IF  SHOULD  OUTPUT  TO  CASSETTE. 


FILE2 

* 


TTYFLAG  SET 

TRANSTEXT  * ■*  THE  VOTRAX  RESULT  IS  ' 
TRANSTEXT  » ■*  ' VOTRAXSYMBOLS 

FILEGUT< VOTRAXSYMBOLS  ENDOFMESSAGE) 


★ 

ENGLISHVOTRAX 

**  OUTPUT  » * TRANS  OF  ENG  TO  VOTRAX  * 


* SET  FLAG  TO  SAY  TRANS  TO  VOTRAX  ALSO. 


3FIFILE2) 

*( RE REED) 
i ( REREED) 


VOTRAXFLAG  * SET 

* 


•CENGVOTRAX) 


I rac3ii!:USI0N  0F  ™NS  10  vo™x- 


ENDENGVOTRAX  TTYFLAG  SET 

TRANSTEXT  * * VOTRAX  RESULT  IS  * 
TRANSTEXT  a * VOTRAXSYMBOLS 
FILE3  VOTRAXSYMBOLS  * REPLACE(VOTRAXSYMBOLS.'[ 
FILED UT( VOTRAXSYMBOLS  ENDOFMESSAGE) 


*F(FILE3) 

* ( REREED) 
') 

t (REREED’ 
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ENGLISHASCII 

*★  OUTPUT  * *★ ' TRANS  OF  ENG  TO  ASCII  * 

* 

* SET  FLAGS  FOR  ASCII  TRANS  AND  FOR  VOrRAX  FUGS. 

* 


ASCI I FLAG  * SET 
VOTRAXFLAG  » SET 

* 

* RETURN  HERE  AT  COMPLETION  OF  TRANSLATION. 

* SEE  IF  SHOULD  OUTPUT  TO  CAS. 

* 

ENDENGASC I I TTYFLAG  SET 

TRANSTEXT  =»  ' ASCII  RESULT  IS 
TRANSTEXT  « ■'  ' ASCI I RESULT 

FILE4  FILEOUT(ASCI.I  RESULT  ENDOFMESSAGE) 


MENGVOTRAX) 


*F(FILE4) 

t ( REREED) 
* (REREED) 


* 

VOTRAXASCII 

*★  OUTPUT  * * TRANS  OF  VOTRAX  TO  ASCII' 

* 

* REMOVE  END  TEXT  MARKER. 

* 

ALLTEXT  ENDTEXTTEST  * NULL 

* 

* CALL  ROUTINE  TO  TRANSLATE. 

* 

ASCI I RESULT  * ASCII (ALLTEXT)  *F(REREED) 

* 

★ SEE  IF  SHOULD  OUTPUT  TO  CAS. 

* 

TTYFLAG  SET  *F(FILE5) 

TRANSTEXT  =»  ' * ASCII  RESULT  i (REREED) 

FILE5  FILEOUT(ASCIIRESULT  ENDOFMESSAGE)  i( REREED) 


NRL  REPORT  7948 


ENGLISHIPA 

**  OUTPUT  * ' TRANS  OF  ENG  TO  I PA  v 

k 

* BRANCH  HERE  IF  TRANS  OF  ENG  ID  VOTRAX  OR  ASCII. 

* 

ENGVOTRAX  IPARESULT  - NULL 

IPARESULT  » T RANSLATETEXT (ALLTEXT) 

.* 

* SEE  IF  WE  ARE  TO  TRANS  TO  VOTRAX. 

.* 

VOTRAXCALL  VOTRAXFLAG  SET  * UNSET  -»F(ENDENGIPA) 

* 

•*  FLAG  WAS  SET  SO  TRANL  TO  VOTRAX. 
k TRANS  THE  STRING. 

k 

VOTRAXSYMBOLS  - VOTRAXTRANSLATEI IPARESULT) 

k 

* IS  ASCII  FUG  SET? 

* 

ASCIIFLAG  SET  =»  UNSET  *F(EN DENG VOTRAX) 

* 

* YES— CALL  ASCII  ROUTINE. 

* 

ASCII  RESULT  » ASCII (VOTRAXSYMBOLS)  * (ENDENGASCII ) 

* 

* COME  HERE  IF  NOT  TRANS  TO  VOTRAX  OR  ASCII.. 

* SEE  IF  SHOULD  OUTPUT  TO  CAS. 

* 

ENDENGIPATTYFLAG  SET  ' *F(FILE6) 

TRANSTEXT  « ' IPA  RESULT  IS  ■* 

TRANSTEXT  ■ ' ■*  IPARESULT  •CflEREED) 

FILE6  FILEOUTCIPARESULT  ENDOFMESSAGE)  * (REREED) 

* 

* 
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kkkkk**kk*Jl*kkkkkkkkkkk**ktt*kk*kkkkkkkkkk+k***kkk*irirkkk****+k±*+ 

k 

* TRANSLATETEXT 

k 

kkkkkkkkkkkkkk*kkk*kkkkkkk*kkkkkkkkkkkk*kkk*kk*kkk*kkk*k****k**k 

k 

k 

k START  THE  SCAN  AT  FIRST  CHARACTER  OF  INPUT. 

* POSITION  0 IS  A BLANK  INSERTED  BY  THE  PROGRAM  TO  DELIMIT 

* THE  FIRST  WORD. 

k 

TRANSLATETEXT  I * I 

TRANSLATETEXT  » NULL 

* 

.*  PICK  OFF  ONE  CHARACTER  OF  -'TEXT'  AT  ITH  POSITION. 

* 

NEXTCHA R TEXT  POSU)  LEN(1 ) . CHAR 

* 

* TEST  FOR  END  TEXT  MARKER  — IF  SO  RETURN. 

* 

CHAR  ENDTEXT  *S<  RETURN) 

* 

* CONCATENATE  THE  PHONEME  WHICH  IS  RETURNED  BY  -'TRANSLATE-'. 

* 

TRANSLATETEXT  * TRANSLATETEXT  TRANSLATE  (TEXT,  CHAR,  •'ENG') 

k 

k INCREMENT  THE  POINTER  TO  THE  NEXT  CHARACTER  IN  ■'TEXT'  TO  BE 

* TRANSLATED* 

* ■/INCVALUE'*SET  BY  ROUTINE  -'TRANSLATE'. 

* 

I « I + INCVALUE  * ( NEXTCHA R) 

* 

* 
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*****<r*A*iMr***'*^A'*.***OA'if**>hH>r  ******  a****************  **  ********* 
* 

* TRANSLATE 

* 

* THIS  ROUTINE  DOES  THE  ACTUAL  TRANSLATION  OF  THE  LETTER 

* PASSED  BY  THE  MAIN  PROGRAM  IN  'CHAR'  BY  CHOOSING  THE  RULE 

* WHICH  APPLIES  TO  THE  CONTENTS  OF  'TEXT'  AND  PASSING  BACK  THE 

* PHONEME. 

* ADDITIONALLY,  TRANSLATE  SETS  A VARIABLE  'INCVALUE'  TO  THE 

* NUMBER  OF  SYMBOLS  REPLACED  SO  THAT  THE 

* MAIN  ROUTINE  MAY  INCREMENT  THE  POINTER  INTO  'TEXT'. 

* 

*****************  * **  *****  **********  ******  *********************** 
* 

* 

* SET  OF  SPECIAL  CASE  SYMBOLS. 

* # * ! OR  MORE  VOWELS 

* *«  I OR  MORE  CONSONANTS 

* . » A VOICED  CONSONANT 

* $ » SINGLE  CONSONANT  FOLLOWED  BY  AN  'I'  OR  'E' 

* X * SUFFIX  SUCH  AS  'E','ES','ED', 'ER'.'ING'.'ELY' 

* A » A SIBILANT 

* 0 » A CONSONANT  AFTER  WHICH  LONG  'U'  IS  PRONOUNCED 

* AS  IN  'RULE',  NOT  'MULE' 

* - ■ A SINGLE  CONSONANT 

* ♦ » A FRONT  VOWEL*  'E','I','Y' 

* « « 0 OR  MORE  CONSONANTS 

* 

* SPECIALCASE  - '#*.S%&SA+*' 

* PUNCTSYMBOL  - ' ,.?| *+*"$».-<>! <*'  SINGLEQUOTE 

* 

* 

TRANSLATE  GRAPHEME  ANY (PUNCTSYMBOL)  REM  * 'PUNCT' 

GRAPHEME  ANY  (NUMBER)  REM  = 'NUMBER' 

* 

* COPY  THE  SET  OF  POSSIBLE  RULES  FOR  THE  CHARACTER  PASSED. 

* 

GRRULE  » S (GRAPHEME  'RULE.'  QUAL) 

* 

* break  OFF  ONE  OF  THE  RULES. 

* RULEBREAKPATTERN  = BREAKC'V)  . RULE  'V 

* 

NEXTRULE  GRRULE  RULEBREA-  PATTERN  * NULL  iF(NORULEAPPLIES) 

* 
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* BREAK  THE  RULE  INTO  ITS  COMPONENT  PIECES  OF  THE  PATTERN 

* TO  MATCH  AND  THE  PHONEMES  TO  REPLACE  IF  MATCH  IS  A SUCCESS. 

* BREAK  RULE  INTO  PIECES  OF  STRING  BEFORE  THE  SYMBOL  TRANSLATING, 

* STRING  OF  LETTERS  TO  BE  REPLACED  BY  'PHONEME-'  IF  MATCH  OCCURS, 

* AND  STRING  OF  LETTERS  AFTER  THE  REPLACEMENT  LETTERS. 

* , 

* RULECHARSEP  * BREAK!-'!')  . BACKCHAR' 

**  BREAK(-'I')  . CHARDEF  ']' 

**  BREAK!'*')  . FORCHAR 

**  REM  . PHONEME 

.* 

RULE  RULECHARSEP 

* 

* A CHECK  OF  THE  RULE  MUST  BE  MADE  IN  CASE  A CHARACTER  OTHER  THAN 

* AN  ALPHABETIC  OR  BRACKET  APPEARS. 

* WHEN  THIS  OCCURS  THE  RULE  HAS  A SPECIAL  CASE  SUCH  AS  A VOWEL  OR 

* CONSONANT  SEQUENCE  (#,*)  OR  A VOICED  CONSONANT  !.). 

* WHEN  ONE  OF  THESE  SPECIAL  CHARACTERS  APPEARS  IN  THE  RULE,  THE 

•*  ROUTINE  'SPECI ALCASEP HOC-'  BUILDS  A PATTER!  TO  MATCH  'TEXT'. 

* OTHERWISE  THE  PIECES  OF  THE  RULE  ARE  USED  EXPLICITLY  AS  BELOW. 

* 

* 

* IF  A SPECIAL  CHAR  IS  FOUND  IN  THE  RULE  GO  TO  THE  SPECIAL  PROC. 

★ 

! FORCHAR  BACKCHAR)  ANY(SPECIALCASE)  tS( SPEC I ALCASEP ROC) 

* 

* NO  SPECIAL  SYMBOLS  APPEARED  IN  THE  RULE. 

* MATCH  ON  THE  PIECES.  IF  FAIL  GET  NEXT  RULE. 

* DETERMINE  WHERE  TO  BEGIN  MATCH  BY  BACKING  UP  IN  BUF 

* THE  NUMBER  OF  CHARS  IN  BACKCHAR. 

* 

BACK  - GE! I, SIZE (BACKCHAR))  SIZE (BACKCHAR) 

* *F(NEXTRULE) 

BUF  POS!I  - BACK)  BACKCHAR  CHARDEF  FORCHAR 

+ *F(NEXTPULE) 

★ 
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* MATCH  HAS  MADE. 

* RETURN  THE  PHONEME  SEQUENCE  AS  SPECIFIED  BY  THE  RULE. 

* DETERMINE  THE  AMOUNT  TO  INCREMENT  THE  POINTER. 

* THIS  VALUE  COMPUTED  BASED  ON  NO.  CHARS  IN  CHARDEF. 

* 

INCSET  INCVALUE  * SIZE (CHARDEF) 

★ 

**  OUTPUT  = ■*  RULE  USED  HAS  <'  RULE 

**  OUTPUT  * ■*  PHONEME  IS  <'  PHONEME  +>* 

* GATHER  STATISTIC  AT  THIS  POINT. 

* SEE  IF  STATFLAG  IS  SET.  IF  SO  OUTPUT  RESULTS. 

* 

STATFLAG  SET  *F(TRANSRET) 

STATISTICS  * RULE 

* 

TRANSRET  TRANSLATE  » PHONEME  *<RETUKN) 

* 

* 

* SPECIALCASEPROCi 

.* 

* THIS  IS  THE  SECTION  WHICH  TAKES  CARE  OF  THE  SPECIAL  CASE  RULES. 

* IT  CREATES  PATTERNS  FOR  THE  SPECIAL  CASES  BY  CALLING  THE 

* FUNCTION  'SPECIALBREAK'  WHICH  BUILDS  A PATTERN  BASED  ON 

* THE  SPECIAL  CHARACTERS  IN  THE  STRING  PASSED  AS  THE  PARAMETER.. 

* ON  FAILURE  TO  MATCH  THE  PATTERN  ANOTHER  RULE  WILL  BE  TRIED. 

* 

* RULES  MUST  NOT  HAVE  SPECIAL  CASES  INTERNAL  TO  THE 

* .BRACKETS,  I.E.  IN  'CHARDEF'. 

* IF  THEY  DQ  THEN  THE  PROGRAM  MUST  BE  REVISED  TO  HANDLE 
•*  THE  CASE  BY  USING  ■'SPECIALBREAK-'  ON  -'CHARDEF-'  ALSO. 

* A RULE  IS  OF  THE  FORM  I 

* AIBJC-/PHONEMES/ 

* WHERE  A AND  C ARE  STRINGS  OF  ALPHAEJETICS  OR 

* SPECIAL  SYMBOLS 

* AND  B IS  A STRING  OF  ALPHABETIC  ONLY. 

* 

* CREATE  A PATTERN  FOR  SPECIALCASES  BY  CALLING  -'SPECIALBREAK' 

* POSITION  POINTER  AND  CHARACTERS  TRYING  TO  MATCH  ('CHARDEF'), 

* CALL  SPECIALBREAK  WI1H  FORCHAR. 

* ON  FAILURE  TO  MATCH  GET  ANOTHER  RULE. 

* 

SPECIALCASEPROC 

+ BUF  SPECIALBREAK! BACKCHAR)  POS(I)  CHARDEF 

♦ SPECIALBREAK! FORCHAR)  «S( INCSET) F(NEXTRULE) 

* 

* ON  SUCCESS  RETURN  PHONEMES  TO  MAIN  PROGRAM. 

* 

* 


>.  • 
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**************************************************************** 

* 

* SPECIALS REAK 

* 

* THIS  FUNCTION  BUILDS  A PATTERN  MATCH  BASED  ON  THE  PIECES  OF  THE 

* RULE  PASSED  TO  IT  AS  A PARAMETER. 

* 

***  ****** ******  *********************************** ************  ** 

* 

* 

* DEFINE  PIECES  OF  THE  PATTERN  BASED  ON  THE  SPECIAL  CHARACTER 

* ENCOUNTERED  IN  PARAMETER. 

* 

SPECIALS REAK  PATTERN  * 

* 

* 

* VOWEL  - -'AEIOUY' 

* CONSONANT  =*  'BCDFGHJKLMNPQRSTVWXZ' 

* VOICED  * 'BDVGJLMNRWZ' 

* FRONT  * 'EIY' 

* SUFFIX  » 'ER  ' ! 'E  ' ! 'ES  ' ! -'ED  ' ! 'ING  ' ! /ELY  ' 

* SIBILANT  * ANY('SCGZXJ')  ! 'CH'  ! 'SH' 

* NONPAL  =*  ANY('TSRDLZNJ')  ! 'TH'  ' 'CH'  ! 'SI^ 

* J'PATTE  HI#'  - ANY  (VOWEL)  A RBNO  ( ANY  ( VOWEL) ) 

* $'PATTEfN*'  » ANY (CONSONANT)  A RBNQ( ANY ( CONSONANT ) ) 

* S'PATTEW.'  » ANY  (VOICED) 

* S'PATTERNS'  » ANY (CONSONANT)  ANY('EI') 

* $'PATTERN%'  » SUFFIX 

* $'PATTERN&'  » SIBILANT 

$ 'PATTERNS'  * NONPAL 

* $ 'PATTERN"*'  =«  ANY  (CONSONANT) 

* ^'PATTERN*'  •»  ANY  (FRONT) 

* S' PATTERN*'  * A RBNO  (ANY  (CON SONANT)) 

* 

* REPLACE  EVERYTHING  UP  TO  SPECIAL  CHARACTER  BY  NULL  AND  ASSIGN 

* WHAT  MATCHED  TO  'PATTERN1'. 

* 

REMATCH  STR  BREAK ( SPECI ALCASE ) . PATTERN!  - *F(ALLDONE) 

* 

* BREAK  OFF  THE  SPECIAL  CASE  CHAR  INTO  SYM. 

* REPLACE  IT  BY  THE  NULL  STRING. 

* 

STR  LEN( 1 ) . SYM  » 

* 

* BUILD  PATTERN  TO  PASS  BACK  TO  CALLER  BASED  ON  PREVIOUSLY 

* BUILT  PARTIAL  PATTERN  AND  PATTERN  BASED  ON  THE  SPECIAL  SYMBOL 

* STORED  IN  'SYM'  LOOP  TO  REMATCH  UNTIL  NOTHING  LEFT  IN  STR  OR 

* NO  MORE  SPECIAL  CHARACTERS. 

* 

PATTERN  « PATTERN  PATTERN!  $ (•'PATTERN'  SYM)  K REMATCH) 

* 

* RETURN  WITH  PATTERN  THAT  WAS  BUILT. 

* THE  REMAINDER  OF  'STR'  HAS  NO  SPECIAL  CHARACTERS  IN  IT. 

* 

ALLDONE  SPECIALBREAK  » PATTERN  STR  * (RETURN) 

* 

* 
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nititlcirttitltitltiHtirkirkirklcklrtrk'tritlrirklcklrklriHrklilrklricli  it****1*******'******'*'**** 

★ 

* VOTRAXTRANSLATE 

it 

* TRANSLATES  FROM  IPA  NOTATION  TO  VOTRAX  SYMBOLS. 

* PARAMETERS  ARE  THE  STRING  TO  BE  TRANSLATED.  EACH  PHONEME 

* MAY  BE  DELIMITED  BY  SLASHES. 

* 

***irk**ii  *************************  ******************************* 
* 

* 

VOTRAXTRANSLATE  VOTRAXSTR  * NULL 
I « 1 

ENDIPASTR  » '** 

I PAST R * IPAPHONEMES  ENDIPASTR 
IPASTR  - REPLACE* I PASTRY /'.BLANK) 

* 

* REMOVE  DOUBLE  BLANKS. 

* 

REMOVEBLANKS  IPASTR  DOUBLEBLANK  - BLANK  * S ( REMOVEBLANKS ) 

TRY  IPASTR  POSU)  ENDIPASTR  iS(DONEVOTRAX) 

IPASTR  POSH)  BREAK  (BLANK)  . IPASYM 
DIFFERENT  VOTRAXSTR  - VOTRAXSTR  TRANSLATE (IPASTR,  IPASYM. 'IP SZ) 
I - I + INCVALUE  + I »(TRY) 

DGNEVGTRAX  VOTRAXTRANSLATE  » VOTRAXSTR  K RETURN) 

* 

* 
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**********************'*■****»******»**♦'*■********  ****iHHNt*^** 

* 

* READTEXT 

* 

* read  a series  of  text  to  be  translated. 

* TERMINATE  IT  BY  A # . 

* 

Ifkirkiiltictrkilicirkirkirkitititiriririrkirirkirkkitit-bitirtrifkirit  ***★*»★**■*»***  ********* 
★ 

* 

READTEXT  TOTALTEXT  » 

* 

* 

* ILLEGALPUNCT  » -'Cl/V 

* PUNCTSYMNOBLANK  » -'.A  »+<>?★«- 1)  <4X$"K  SINGLEQUOTE 

* QUOTE  » ^ 

* 

★ 

* SKIP  MESSAGE  IF  INPUT  IS  A FILE. 

* 

INFILE  TTY  *F(REREAD) 

OUTPUT  » -*  ENTER  TEXT  TERMINATED  BY  A * ENDTEXT 

it 

REREAD  TOTALTEXT  ■ TOTALTEXT  INPUTTEXT  BLANK  * F ( F RETU RN ) 

TOTALTEXT  ENDTEXT  *F(  REREAD) 

.★ 

* SEE  IF  USER  WISHES  TO  REDEFINE  INPUT  FILES  AND  OTHERS. 

* TEST  FOR  INPUT  FROM  TTY  OR  INPUT  FILE  TO  BE  END  MARKS. 

* THE  END  OF  FILE  MARK  IS  ###  STARTING  IN  FIRST  CHAR  POSITION. 

* 

TOTALTEXT  ESCAPECODE  *S(F RETURN) 

TOTALTEXT  ENDTEXT  REM  - BLANK  ENDTEXT 

* 

* REMOVE  ILLEGAL  PUNCTUATION  FROM  STRING. 

A 

TEST  TOTALTEXT  ANY  (ILLEGALPUNCT)  - BUNK  iS(TEST) 

* 

* INSERT  BLANKS  ON  EITHER  SIDE  OF  ANY  PUNCTUATION  APPEARING 

* IN  THE  INPUT  TEXT  SO  EACH  WORD  IS  DELIMITED. 

* 

T - 0 

HERE  TOTALTEXT  POS(T)  BREAK (PUNCTSYMNOBUNK)  $ T1 
+ SPAN ( PUNCTSY MBOL)  $ T2 

♦ - Tl  BLANK  T2  BUNK  *F(TEST2)  . 

T » SIZEtTl  T2)  + T + I MHERE) 

TEST2  TOTALTEXT  * BUNK  TOTALTEXT 
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* 

* 

TEST3 

* 

★ 

★ 

* 

* 

* 


REMOVE  MULTIPLE  BLANKS  AND  REPLACE  BY  SINGLE  BLANK. 

TOTALTEXT  DOUBLEBLANK  * BLANK  8S(TEST3) 

SEE  IF  FLAG  THAT  SAYS  TO  OUTPUT  THE  INPUT  TEXT  TO  CASSETTE  ON. 

TEXTFLAG  SET  *F(STATTEST> 

fEMOVE  END  OF  TEXT  MARKER  BEFORE  WRITING  TO  CASSETTE. 

TEMPTEXT  » TOTALTEXT 
TEMPTEXT  ENDTEXT  - NULL 

INSERT  A QUOTE  MARK  BEFORE  AND  AFTER  TEXT  TO  BE  WRITTEN  TO  CAS. 

FILEOUTtQUOTE  TEMPTEXT  QUOTE) 

SEE  IF  STATFLAG  SET.  IF  SO  OUTPUT  THE  TEXT  TO  STAT  FILE. 

iF(RET) 


* 

* 

* 

* 

* 
k 

STATTEST  STATFLAG  SET 

* 

★ INSERT  THE  ACTUAL  TEXT  TO  BE  TRANSLATED  TO  THE  STAT  FILE. 

k 

STATISTICS  * TOTALTEXT 


* 

RET 


READTEXT  * TOTALTEXT 


*< RETURN) 


* 

* 
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*****************  ***************  ****************  **************** 

* 

* ASCII 

★ 

* THIS  TRANSLATES  TO  ASCII. 

* 

**************************************************************** 

* 

* 

ASCII  ASCII  » NULL 
* 

* REMOVE  LEFT  AND  RIGHT  BRACKETS. 

★ 

STRING  - REPUCEISTRINC,'!]','  ') 

* 

* INSERT  AN  END  BLANK  SO  FOLLOWING  BREAK  WILL  WORK  ON  LAST  WORD. 

* 

STRING  - STRING  BLANK 

* 

★ GET  RID  OF  DOUBLE  BLANKS  SO  THAT  BREAK  WILL  NOT  GET  NULL  SYMBOL. 

* 

AGAIN  STRING  DOUBLEBLANK  - BLANK  *S(AGAIN) 

* 

* REMOVE  INITIAL  BLANK  IF  ANY  SO  BREAK  WON'T  BREAK  BEFORE  IT. 

* 

STRING  POS(O)  BUNK  » NULL 

LOOP  STRING  BREAK  I BLANK)  . ASCIISYM  BLANK  - *F(RETUFN) 

ASCI  IS YM  BLANK  - 'BUNK' 

ASCII  « ASCII  DIFFER(NULL»$ (ASCIISYM  '.CODE')) 

* $ (ASCIISYM  '.CODE')  tS(LOOP) 

* 

* 
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**************************************************************** 

* 

* CLI 

* 

★ THIS  INPUTS  THE  KIND  OF  TRANSLATION  WANTED 

★ THEN  BUILDS  VARIABLES  IN  •'IN'  AND  •'OUT' 

★ TO  TRANSFER  INDIRECT  TO  THE  CODE. 

★ 

*********************  ******************************************* 
* 

* 

* 

*♦ 

* 

* 

* 

* 

* 

CLI 

CLIRETRY  OUTPUT  * ' WHAT  TRANSLATION  DO  YOU  WANT?' 

RESPONSE  * INPUT  BLANK 


RESPONSE  BREAK (PUNCTSYMBOL)  . IN  ANY ( PUNCTSYMBOL) 


♦ 

BREAK (PUNCTSYMBOL)  . OUT 

IN 

POS(O) 

ENGLISH  RPOS(O)  » 'ENGLISH' 

*S (OUTTEST) 

IN 

POS(O) 

IPA  RPOS(O)  - 'IPA' 

tS (OUTTEST) 

IN 

POS(O) 

VOTRA  RPOS(O)  'VOTRAX' 

iS(OUTTEST) 

+ 

* 

IN 

POS(O) 

ASCI  RPOS(O)  * 'ASCII' 

iS(OUTTEST) 

F(ERRORIN) 

OUTTEST 

OUT 

POS(O) 

ENGLISH  RPOS(O)  - 'ENGLISH' 

iS(RETUW) 

OUT 

POS(O) 

IPA  RPOS(O)  » 'IPA' 

iS(RETUFN) 

OUT 

POS(O) 

VOTRA  RPOS(O)  » 'VOTRAX' 

iS(RETUFN) 

+ 

* 

OUT 

POS(O) 

ASCI  RPOS(O)  - 'ASCII' 

IS(RETUKN) 
F(  ERROROUT) 

ERRURIN 

OUTPUT 

- ' INITIAL  TRANSLATION  PARAMETER  ILLEGAL' 

OUTPUT  » ' PARAMETER  IS  ' IN  *( CLI RETRY) 

ERROROUT  OUTPUT  = ' FINAL  TRANSLATION  PARAMETER  ERROR' 


OUTPUT  - ' PARAMETER  IS  ' OUT  * (CLI  RETRY) 


ENGLISH  * 'ENGLISH'  ! 'ENGLIS'  » 'ENGLI'  ! 'ENGL'  ! 'ENG'  ! 
'EN'  ! 'E' 

IPA  ■ 'IPA'  i 'IP'  ! 'I' 

VOTRA  * 'VOTRAX'  ! 'VOTRA'  ! 'VOTR'  i 'VOT'  ! 'VO'  ! 'V' 
ASCI  « 'ASCII'  ! 'ASCI'  ! 'ASC'  * 'AS'  ! 'A' 


RESPONSE  « NULL 
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* « » *-■*-*  » | I i l.  t-  l -l.  1-  1.  1 i L 1 ».  i.  I g l .1.  I..».  I I.  1.  I >-  1.  t 1.  t I.  | t I.  1.  I 1 » ■ « • • » 1..L.1  1..1. 

W ^ W ««  iTinT  W aA  AAA  A AA  A AA  AAmA  A 

* 

* PREDEFINE 

* 

» »*****■»♦*★★★★★★***★  ***********************************  ********* 
A 

* 

* DEFINE  FILENUMBERS  FOR  THE  INPUT  FILE  THE  STAT  FILE  AND  THE 

* TRANS  FILE.  THESE  ARE  USED  IN  THE  VARIABLE  ASSIGNMENTS  TO 

* INDICATE  THE  I/O. 

* 

FILEDEFINE  INNO  = 22 
STATNO  * 23 
TRANSNO  » 24 
STATFLAG  * UNSET 
TTYFLAG  * UNSET 
TEXTFLAG  - UNSET 

* 

OUTPUT  - •'  WHAT  IS  THE  INPUT  FILE  NAME?' 

INFILE  » INPUT 

* 

★ SEE  IF  IT  IS  THE  TTY  (INPUT  DEVICE  USING). 

* 

INFILE  TTY  *F(OKAYI) 

★ 

* YES  IT  IS  SO  REDEFINE  INPUT  FILE'  TO  TTY  - 2 ON  THIS  SYSTEM. 

* 

INNO  * 2 * CSOK! ) 

* 

★ THE  DEVICE  IS  NOT  THE  TTY  SO  MAKE  CORRESPONDENCE  WITH  FILE 

★ NAME  AND  DEVICE  NUMBER. 

★ 

OKAY I IF ILE( INNO, INFILE) 

* 

* GET  TRANSFILE  NAME. 

* 

S0K1  OUTPUT  » ' WHAT  IS  THE  FILENAME  FOR  THE  ' 

♦ 'TRANSLATION  RESULTS?' 

TFILE  » INPUT 

IDENT(TFILE,NULL)  iF(NEXT) 

ENDFILEI TRANSNO)  * (NEXTU) 

NEXT  TFILE  TTY  *F(.CASTEST) 

* 

* REDEFINE  FILENO  TO  TTY. 

* 

TRANSNO  - 2 

* 
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* SET  FLAG  TO  SAY  TTY  OUTPUT. 

* 

TTYFLAG  - SET 

* (0KAY2) 

* SEE  IF  DEVICE  IS  THE  CASSETTE. 

* SEE  IF  USER  WISHES  ORIGINAL  TEXT  TO  BE  WRITTEN. 

.A. 

CASTEST  TFILE  CAS 

•F(ASKAGAIN) 

* YES  IT  IS  CASSETTE  SO  DEFINE  NO.  TO  TTY  AND 

SET  FLAG. 

TRANSNO  » 2 

ASKAGAIN  OUTPUT  * ' TEXT  TO  FILE.  TOO?-' 
CASANS  - INPUT 

* 

* SEE  IF  ANSWER  YES. 

★ 

CASANS  NOANS 

*S(OKAY2) 

it 

* MAKE  SURE  ANSWER  IS  YES  AND  NOTHING  ELSE. 

* 

CASANS  YESANS 

tF(ASKAGAlN) 

* ALL  IS  OKAY  AND  ANSWER  WAS  YES.  SET  FLAG. 

* 

TEXTFLAG  - SET 

* 

QKAY2  TFILE  P0S<0)  OLDTFILE  RPOS(O)  *S{HEXTQ) 

.* 

* THIS  IS  A NEW  FILE  SO  SAVE  ITS  NAME. 

★ 

OLDTFILE  -TFILE 

* 

* CLOSE  THE  OLD  FILE. 

★ 

ENDF I LE (TRANSNO ) 

* 

* MAKE  NEW  ASSIGNMENT. 

it 

OFILE (TRANSNO .TFILE) 

* 
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NEXTQ  OUTPUT  » ' DO  YOU  WANT  TO  GATHER  STATISTICS?^ 

ANS  * INPUT 
ANS  NOANS 

STATISTICS  ARE  WANTED. 

OUTPUT  * ' WHAT  IS  THE  FILENAME?-* 

STFILE  = INPUT 
I DENT (STFILE, NULL) 

SET  FLAG  TO  INDICATE  STAT  GATHER. 

STATFLAG  - SET 

SEE  IF  STATS  ARE  TO  BE  SENT  TO  TTY. 

STFILE  TTY 
STATNO  * 2 

ICAY3  STFILE  POS(O)  OLDSTFILE  RPOSCO) 

NOT  THE  SAME  STAT  FILE  SO  SAVE  THE  NAME. 

OLDSTFILE  * STFILE 

CLOSE  THE  OLD  STAT  FILE. 

ENDFILE( STATNO) 

REDEFINE  THE  STAT  FILE  NAME. 

OFILEISTATNO, STFILE) 

SET  UP  VARIABLE  ASSOCIATIONS. 

DEF  INPUTS  INPUTTEXT'.INNO, 80) 

OUTPUT!  "STATISTICS*  ,STATNO,-"  ( 1 X,  1 5A5)' ) 

OUTPUT!  'T  RANSTEXT' , T RANSNO , ' ! 1 X , 1 5A5  )'  ) * ! RETU  RN ) 


tS!DEF) 


*S! NEXTQ) 


tF!0KAY3) 

*S!DEF) 
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kkkkit**itir*-kitkiHrk-tfkinrkiiiHHt1HHrkkii1rk**-k-k-kii  *»»»**  **********  ********* 
* 

* FILEOUT 

* 

* THIS  ROUTINE  OUTPUTS  VOTRAX  MNEMONIC  CODES  TO  A FILE. 

* EACH  CODE  IS  SEPARATED  BY  A BLANK. 

* THE  SEQUENCE  IS  PRECEDED  BY  A RECORD  ON  CODE  MEANT 

* FOR  THE  733  ASR  CASSETTE  TO  TURN  ON  THE  CASSETTE. 

* THE  MESSAGE  IS  ENDED  BY  AN  END  OF  MESSAGE  CHARACTER 

* WHICH  HAS  MEANING  TO  THE  SPEECH  LAB  PROGRAMS  RUNNING 

* ON  THE  TI  960A,  FOLLOWED  BY  A DELETE  CODE  TO  WRITE  OVER  THE 

* RECORD  OFF  IN  THE  CASSETTE  BUF,  AND  THE  FINAL  CODE  IS  A 

* RECORD  OFF  TO  SHUT  THE  CASSETTE  OFF. 

* 

* DC2  - RECOROON 

* DC4  - RECORUG FF 

* IS  USED  BY  THE  TI  SPEECH  LAB  AS  AN 

* END  OF  MESSAGE  CODE. 

■k 

* ENDOFMESSAGE  IS  INSERTED 

* BEFORE  THIS  ROUTINE  IS  CALLED  IF  IT  IS  WANTED  IN  THE  RECORD. 

* 

kk*it***********1r**irk*1t*****ir***********irk*ic****irk**+*+******+*** 

* 

■k 

* REMOVE  BRACKETS  ] AND  I FROM  THE  TEXT. 

* 

FILEOUT  TEMPOUT  * HE  PLACE ( BUF, ' ] I ' , BLANK  BLANK) 

RELOOP  I TEMPOUT  DOUBLEBLANK  » BUNK  »S(RELOOPI ) 

* 

* SEND  THE  TEXT  TO  THE  FILE. 

* ALSO  BREAK  UP  INTO  BLOCKS  AT  A BUNK  SO  THAT 

* THE  COMMUNICATIONS  PROCESSOR  DOESN'T  ELIMINATE 

* IMPORTANT  BLANKS. 

* 

TEMPOUT  « RECORDON  TEMPOUT  DUPL (DELETE, 86) 

♦ RECORDOFF  DELETE  BLANK 

REDOO  TEMPOUT  (TA3<70>  BREAK! BUNK))  . T » *F(LAST) 
TRANSTEXT  » T *< REDOO) 

★ 

LAST  TRANSTEXT  » TEMPOUT  * (RETURN) 

* 

★ 
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ItitItIrkIrilirkicIrIrkItlilrltiritirirMIrk***********************************-**** 

* 

★ 

* DEFINE  SOME  ERROR  MESSAGES. 

* 

NQRULEAPPLIES  OUTPUT  = ' NO  RULE  APPLIES.  RULES  ATTEMPTING  ' 

♦ -'TO  USE  ARE  <'  GRAPHEME  'RULE.'  QUAL  '>' 

OUTPUT  » ' THE  CONTENTS  ARE  <'  $< GRAPHEME  'RULE.-'  QUAL)  '>' 
OUTPUT  = ' CHARACTER  ATTEMPTING  TO  PROCESS  IS  <' 

+ GRAPHEME  '>'  »( REREED) 

★ 

RULESYNTAXERROR  OUTPUT  * ' SYNTAX  ERROR  IN  RULE  FORMATION  ' 

+ -'RULE  IS-'  RULE  t (REREED) 

it 

EOF  OUTPUT  « •'  EOF  ENCOUNTERED  IN  INPUT  FILE-' 

OUTPUT  » •'  DO  YOU  WISH  TO  CONTINUE?  ' 

ANS  - INPUT 

ANS  NO ANS  iF(BEG) 

ENDFILE(TRANSNO) 

ENDFILE(STATNO) 

DONE  OUTPUT  » J ALL  DONE  * 

* 

* 

END 
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PROGRAM  DOCUMENTATION  FOR  DICT 


DICT  searches  an  English  dictionary  file  specified  by  the  user  for  all  the  words  which 
match  a specified  rule.  The  rule,  similar  to  the  rules  for  the  translation  program,  consists 
of  only  the  left  part  of  the  rule,  as  no  translation  is  needed.  Since  the  program  must  find 
all  occurrences  of  a match,  the  brackets,  *[’  and  ‘]  ’,  are  not  needed.  Special  symbols 
retaining  their  meaning  from  the  translation  program  may  also  be  used  in  the  rule.  A 
double  quote  * " ’ may  be  used  to  delimit  the  rule  on  the  left  or  right  but  is  necessary  only 
to  make  a trailing  blank  unambiguous. 

This  program  permits  the  testing  of  a proposed  alteration  or  addition  to  the  rules  by 
finding  all  the  words  which  would  match  the  new  rule.  A sample  dialog  is  shown  in  Fig. 
Bl. 


Initially  the  program  requests  the  names  of  the  dictionary  file,  the  result  file,  and  the 
size  of  the  dictionary  file.  The  input  terminal  (TTY)  is  accepted  as  a valid  file  name.  The 
routine  FILEDEFINE  also  makes  the  logical  name  correspondences  between  ENTRY  and 
the  dictionary  file  and  RESULT  and  the  output  file. 

The  program  will  assign  space  for  two  arrays  having  the  size  specified  by  the  user  in 
FILEDEFINE.  The  routine  READFILE  inputs  the  dictionary  file  into  these  arrays,  one 
array  (ENGDICT)  for  the  English  text  and  the  other  array  (IPADICT)  for  the  IPA  represen- 
tation, if  present.  DICT  can  search  the  IPA  array  for  a match  with  the  rule  if  wished. 
Therefore  the  program  determines  from  the  user  whether  an  English  or  IPA  search  is  re- 
quired. After  this  information  is  recorded,  the  new  rule  to  be  tested  is  read  into  RULE. 
The  routine  FIND  scans  through  the  specified  dictionary  array,  searching  for  a match  with 
the  rule  specified.  On  finding  a match,  the  matched  word  is  written  to  the  output  file.  A 
count  of  the  number  of  matches  is  kept  in  TOTAL  and  written  after  all  the  matches  are 
found.  When  special  symbols  are  included  in  the  rule,  a special  pattern  must  be  built. 

This  pattern  is  built  in  the  same  way  that  SPECIALCASEPROC  and  SPECIALBREAK  of 
the  transi  , ion  program  builds  the  pattern. 
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STARr  OF  PROGRAM  — LAST  UPDATE  APRIL  4,  1975 
WHAT  IS  THE  DICTIONARY  FILE  NAME? 

B RN4K 

IS  IT  AN  ENGLISH  AND  IPA  FILE? 

N 

WHAT  IS  THE  FILENAME  FOR  THE  RESULTS? 

TTY 

EOF  IN  READING 

WHAT  IS  THE  CONTEXT  TO  SEARCH  FOR? 

♦D* 

SEARCH  STARTING 

ENGLISH  IPA 

EUROPE 

EUROPEAN 

MEDIUM 

NEUTRAL 

MUSEUM 

LIEUTENANT 


TOT'L  MATCHES  l <5 
END  OF  SEARCH 

WHAT  IS  THE  CONTEXT  TO  SEARCH  FOR? 

SEARCH  STARTING 

ENGLISH  IPA 

MONTHS 

STRENGTH 

LENGTH 

RIGHTS 

THOUGHTS 

LIGHTS 

ATTEMPTS 

NIGHTS 

WARMTH 


TOTAL  MATCHES  « 9 

END  OF  SEARCH 

HHAT  IS  THE  CONTEXT  TO  SEARCH  FOR? 

EAO 

SEARCH  STARTING 

ENGLISH  IPA 

HEAD 

ALREADY 

DEAO 

INSTEAD 

READ 

REAOY 

READING 

LEAD 

AHEAD 

LEADERS 

LEADERSHIP 

SPREAD 

LEADER 

LEADING 

HEAOOUARTERS 

HEADED 

HEADS 

READER 

READILY 

DREAD 

STEADY 

READERS 

LEADS 

HEADING 

WIDESPREAD 


1UIAL  MATCHES  i 25 
END  OF  SEARCH 

HHAT  IS  THE  CONTEXT  ID  SEARCH  FOR? 

HHAT  TYPE  OF  SEARCH  DO  YOU  WANT— ENG  OR  IPA? 
WANT  TO  QUIT? 


Fig.  B1  — Sample 
dialog  with  DICT 
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DICT  Program  Listing 


*******************************************k r********** ********** 

* 

★ *****  DI CT  ***** 

* 

* THIS  PROGRAM  SEARCHES  A DICTIONARY  FILE  OF  ENGLISH  WORDS 

* AND  THEIR  IPA  TRANSCRIPTIONS  ACCORDING  TO  A RULE  SPECIFIED 

* 3Y  THE  USER. 

* 

**************************************************************** 

* 

* 

* DEFINE  THE  FUNCTIONS. 

* 

* PREDEFINE  ASKS  FOR  INPUT  DICTIONARY  FILE  NAME  AND  RESULT  FILE 

* NAME.  TTY  IS  LEGAL  INPUT  TO  EITHER  QUESTION. 

* 

DEFINEI-'FILEOEFINE!)') 

* 

* READFILE  INPUTS  THE  DICTIONARY  FILE  INTO  THE  ARRAYS 

* ENGDICT  AND  IPADICT.  TO  DO  THIS  READFILE  BREAKS  EACH  RECORD 
•*  OF  THE  FILE  INTO  TWO  PIECES,  THE  ENGLISH  AND  IPA. 

* 

DEFINE!' READFILE!)-') 

* 

* FIND  IS  THE  ROUTINE  WHICH  SCANS  THE  DICTIONARY  ARRAYS  FOR  A 

* RULE  MATCH.  ON  FINDING  ONE  FIND  OUTPUTS  THE  RESULT  TO  EITHER 

* THE  TTY  OR  A SPECIFIED  FILE  AS  PREDEFINED. 

* PARAMETERS  ARE  THE  RULE  SPECIFIED  TO  SEARCH  ON, 

* THE  ARRAY  TO  SEARCH  EITHER  IPA  OR  ENG, 

* AND  THE  INDEX  SET— THIS  IS  IN  CASE  THE  FILE  IS  AN  INDEXED  FILE. 

•*  IN  THIS  CASE  THE  USER  MAY  SEPECIFY  AN  INDEX  SET— THIS  FEATURE 

* NOT  IMPLEMETED  YET. 

* 

DEFINE('FIND( RULE,QUAL, INDEY'') 

* 

* 

* MAIN  PROGRAM  STARTS  HERE. 

★ 

★ 

INPUT!-' INPUTS,  2, 80) 

* 

* SET  TRIM  OPTION  SO  ALL  INPUT  DONE  WITH  TRAILING  BLANKS  TRUNCATED 

* 

4TRIM  = J 

4STLIMIT  « 100000000 
DICTSIZE  = 4000 

* 
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INIT  SOME  VARIABLES. 

NULL  « 

BLANK  =>^ 

DOUBLEBLANK  = ' •' 

SLASH  = '/' 

INDEX  * NULL 
ENDTEXT  =*  *** 

QUOTE  = 

TTY  * POS(O)  'TTY'  RPOS(O) 

YESANS  = POS(O)  ('Y'  ! -'YES')  RPOS(O) 

INIT  SOME  VARIABLES  USED  IN  SPECIAL  CASE  ROUTINES. 

THIS  INIT  IS  DONE  IN  BEGINNING  FOR  EFFICIENCY. 

SPECIALCASE  * 

DEFINE  THE  SPECIAL  PATTERNS  TO  BE  USED. 

VOWEL  =»  'AEIOUY' 

CONSONANT  » 'BCDFGHJKLMNPQRSTVWXZ' 

VOICED  * 'BDVGJLMNRWZ' 

FRONT  * 'EIY' 

SUFFIX  » 'ER  ' ! ' E ' ! 'ES  ' ! -"ED  ' • 'ING  ' ! 'ELY  ' 
SIBILANT  * ANY('SCGZXJ-')  ! 'CH'  ! ,'SH' 

NONPAL  » ANY('TSRDLZNJ')  ! 'TH'  ! 'CH'  ! 'SH' 
$'PATTEKN#'  * ANYC VOWEL)  ARBNO ( ANY ( VOWEL ) ) 

$ 'PATTER!*'  « ANY (CONSONANT)  ARBNO (ANY (CONSONANT)) 

S' PATTERN.'  « ANY  (VOICED) 

S'PATTERNS'  * ANY  (CONSONANT)  ANY('EI') 

S'PATTERNX'  » SUFFIX 
S'PATTERN&'  a SIBILANT 
S'PATTERNO'  - NONPAL 
$' PATTERN'*'  * ANY  (CONSONANT) 

$'  PATTERN*'  « ANY  (FRONT) 

$' PATTERN*'  » ARBNO(ANY(CONSONANT)) 

ENGL  » 'ENG'  ! 'E' 

IPAR  » 'IPA'  ! 'IP'  ! 'I' 

ENGANS  » POS(O)  ENGL  RPOS(O) 

IPAANS  « PQS(O)  IPAR  RPOS(O) 


START  PROGRAM. 


OUTPUT  « •'  START  OF  PROGRAM  — LAST  UPDATE  APRIL  4t  MIS' 

* 
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REDEFINE  FILEDEFINEO 

* 


* DEFINE  THE  ARRAYS. 

* 4000  IS  CURRENTLY  THE  LIMIT  ON  THE  SIZE  OF  THE  ARRAYS. 

* 


ENCDICT  ■=  ARRAY(DICTSIZE) 

IPADICT  « AR RAY(DICTSIZE) 

READ  IN  THE  FILE  AND  PLACE  INTO  ARRAYS. 

READFILEO 

IF  SINGLE  FILE  SET  SEARCHTYPE  AUTOMATICALLY  ENG. 

SINGLEFLAG  ■'ON'  iS(QUALSET) 

QUERY  FOR  SEARCH  TYPE. 


* 

★ 

* 

* 

* 

* 

* 

* 

* 

SEARCHCHANGE  OUTPUT  ■ ■*  WHAT  TYPE  OF  SEARCH  DO  YOU  WANT— ENG  OR  I PA?' 
QUAL  - INPUT 

* 

★ SEE  IF  NOTHING  RESPONDED.  IF  NONE  MAY  WISH  TO  QUIT. 


* 

* 

★ 


I DENT (QUAL, NULL) 

QUAL  ENGANS  » 'ENG' 

QUAL  IPAANS  * 'I PA' 

QUERY  FOR  PATTERN  TO  SEARCH  FOR. 


iS(QUIT) 
tS (ASKAGAIN) 

*F( SEARCHCHANGE) 


ASKAGAIN  OUTPUT  « ' WHAT  IS  THE  CONTEXT  TO  SEARCH  FOR?' 

RULE  » INPUT 

RULE  POS(O)  QUOTE  = NULL 
RULE  QUOTE  RPOS(O)  * NULL 

* 

* SEE  IF  USER  WISHES  TO  REDEFINE  THE  SEARCH  TYPE. 

* IF  SO  A NULL  ANS  GIVEN. 

j*r 

IDENT( RULE.NULL)  «S (SEARCHCHANGE) 

* 

OUTPUT  « ' SEARCH  STARTING  •* 

RESULT  - •'  ENGLISH  IPA  •' 

RESULT  - ' ' 


* FIND  THE  APPLICABLE  ENTRIES. 

■k 

FIND( RULE, QUAL, INDEX) 

* 


RESULT  = " 

RESULT 

RESULT  * ' TOTAL  MATCHES  « ■*  TOTAL 

RESULT  * ' ' 

OUTPUT  » ' END  OF  SEARCH.  *( ASKAGAIN) 

*r 

QUALSET  QUAL  = 'ENG'  * (ASKAGAIN) 
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***** ******************  ******  *****  ************************  ****** 


* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 


FIND 

DEFINE  FIND  H3UTINE  WHICH  SEARCHES  THE  DICTIONARY 
REQUESTED  FOR  THE  SEQUENCE  OF  CHARACTERS  PASSED. 

PARAMETERS  ARE*' 

RULE  WHICH- INDICATES  THE  SEQUENCE  OF  CHARACTERS 

TO  SEARCH  FOR  IN  THE  DICTIONARY. 

A SPECIAL  CASE  SYMBOL  MAY  BE  USED.  IN  THIS  CASE 
SPECIALCASEP ROC  IS  USED  TO  BUILD  A PATTERN. 

QUAL  NHICH  INDICATES  WHETHER  THE  ENGLISH  (ENG)  OR 

OR  IPA  DICTIONARY  IS  TO  BE  SEARCHED* 

INDEX  WHICH  INDICATES  WHICH  ENTRIES  IN  THE  DICTIONARY  MAY 

FULLFILL  THE  RULE  REQUIRED. 

IF  INDEX  IS  NULL,  A SEQUENTIAL  SEARCH  IS  PERFORMED. 


*********  *****************************************  ************** 


* 


* IT  IS  ASSUMED  THAT  ENGDICT  AND  IPADICT  AFE  INITIALIZED. 

* THIS  INITIALIZATION  IS  DONE  BY  THE  READDICT  ROUTINE. 

* 

* SEE  IF  ANY  SPECIAL  SYMBOLS  OCCUR  IN  THE  RULE  PASSED. 

* IF  SO  SPECIALCASEPROC  MUST  BE  INVOKED. 

* 


FIND  TOTAL  - 0 

RULE  ANY  < SPECI ALCASE)  *S(SPECIALCASEPROC) 

* 


* NO  SPECIAL  CASE  SYMBOLS  OR  ELSE  RETURNED  FROM  PATTERN  BUILDING 

* OF  THE  SPECIALCASEPROC. 

* 


INDEXTEST  IOENT( INDEX, NULL) 


*S(INC) 


THERE  ARE  INDEXES  SPECIFIED— GET  THEM  ONE  BY  ONE. 


NEXT  INDEX  BREAKS,  ')  . I ANYC',  ')  » NULL  * F ( RETU RN ) 

it 

* SEE  IF  THE  FIRST  ENTRY  SUGGESTED  MATCHES  THE  RULE  PASSED. 

* BUILD  THE  NAME  OF  THE  ARRAY  TO  BE  CHECKED. 

* INCLUDE  THE  ENTRY  TO  CHECK.  THIS  IS  INDICATED  BY  VARIABLE  I. 

* 

ITEM<$(QUAL  •'DICT-') , I ) RULE  tF(NEXT) 

TOTAL  - TOTAL  + ! 

RESULT  « ENGDICT<I>  * * IPADICT<I>  j(NEXT) 
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It 

* SET  THE  INDEX  TO  J . 

* 

INC  I » I 

* 

* SEE  IF  RULE  APPLIES—IF  SO*,’  OUTPUT  RESULTS. 

* CONTINUE  SEARCH. 

* 

ITEM  ITEM( SCQUAL  'DICT')tI)  RULE  *F(NEXT2) 

★ 

* SEE  IF  SINGLE  SEARCH  AND  ONLY  TO  PRINT  ONE  ARRAY. 

* 

SINGLEFUG  'ON'  tS(ONEOUT) 

TOTAL  » TOTAL  + I 

RESULT  » ENGDICT<I>  ' ' IPADICT<I> 

NEXT2  I « LT(IfSIZE)  I + 1 iF(RETUFN)SUTEM) 

* 

ONEOUT  TOTAL  « TOTAL  + I 

RESULT  « ENGDICT<I>  KNEXT2) 

•k 

* DEFINE  THE  SPECIALCASE  ROUTINE  TO  BUILD  PATTERN  AS  SPECIFIED 

* BY  THE  SPECIAL  SYMBOLS  #*.$X&<T+  AND  * . 

* ALSO  NOTE  THAT  THESE  SYMBOLS  AND  THEIR  CORRESPONDING 

* PATTERNS  ARE  INITIALIZED  IN  THE  BEGINNING  OF  THE 

* PROGRAM  FOR  EFFICIENCY. 

* THEY  APPEAR  HERE  AS  COMMENTS  FOR  READABILITY. 

* 

★ 

* SPECIALCASE  » '#* .$%&«*+!<' 

* VOWEL  « -'AEIOUY' 

* CONSONANT  * 'BCDFGHJKLMNPQRSTVWXZ' 

* VOICED  ■ ■'BDVGJLMNRWZ-' 

* FRONT  - 'EIY' 

* SUFFIX  » 'ER  ■*  J 'E  ' i 'ES  •*  \ -"ED  ' ! -'ING  ' ! 'ELY  ' 

* SIBILANT  * ANY('SCGZXJ')  ! 'CH'  ! •'SH' 

* NONPAL  « ANY(-'TSRDL.ZNJ')  ! 'TH7  ! 7CH'  ! ■'SH' 

* S-7  PATTERN#  7 » ANY (VOWEL)  A R8NO CANY (VOWEL)) 

* {'PATTERN*7  •»  ANY  (CONSONANT)  A RBNO(  ANY  (CONSONANT) ) 

* $'PATTERN.'  ■ ANY(VOICED) 

* S'PATTERNS'  » ANY  (CONSONANT)  ANYC-'EI^) 

* {'PATTERNS7  •»  SUFFIX 

* {'PATTEIN&'  * SIBILANT 

* {'PATTERNS7  » NONPAL 

* {'PATTEfjr'  = ANY (CONSONANT) 

■*  {•'PATTERN*7  * ANY  (FRONT) 

* S-'PATTERN»/  » ARBNO  ( ANY  < CONSONANT ) ) 


* * 
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START  THE  ROUTINE  HERE. 

SPECIALCASEPROC  PATTERN  * NULL 

* REPLACE  EVERYTHING  UP  TO  THE  SPECIAL  CHAR  BY  NULL  AND  ASSIGN 

* WHAT  MATCHED  TO  THE  PATTERN  BEING  BUILT. 

REMATCH  RULE  BREAK  (SPEC  I ALCASE)  . PATTERN!  =*  NULL  iF(ALLDONE) 

* BREAK  OFF  SPECIAL  CHAR  AND  REPLACE  IT  BY  NULL  IN  ORIGINAL  STREAM. 

* 

RULE  LEN(1)  . SYM  « NULL 

* FIND  THE  PIECE  OF  PATTERN  WHICH  CORRESPONDS  TO  THE  SPECIAL  SYM. 

* PATTERN  « PATTERN  PATTERN I $('PAITERN'  SYM)  * (REMATCH) 

* 

# AT  CONCLUSION  OF  SCAN  THRU  RULE  RETURN  WITH  RULE  IN  IT  TO 

* THE  PATTERN  WHICH  WAS  BUILT. 

ALLDONE  RULE  - PATTERN  RULE  MINDEXTEST) 
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•rkirkkkkk-k'kkkkk-kkkkkkk  iHt*1Hr****1rk***irirkitirkirkirk  ******  ************* 
It 

* READFILE 

★ 

* DEFINE  THE  READFILE  ROUTINE. 

* READS  IN  THE  DICTIONARY  FILE  AS  SPECIFIED  BY  THE  USER. 

* BUILDS  TWO  ARRAYS  FOR  THE  PROGRAM  BASED  ON  THIS  FILE. 

* THE  ARRAYS  ARE  THE  IPA  AND  THE  ENGLISH  DICTIONARY  ARRAYS. 

* THE  INPUT  FILE  MUST  BE  IN  THE  FORM  t 

* 

* ENGLISHWORD  ENDTEXTMA  RK  IPAWORD  ciJDTEXT 

Hr 

* A*  •knitMtk ******  -kit-kit  ********  ******  kk-kkirkkkk-kkk  kk  kkk  kkrkk-kkkk-kirkk  irk 

* 
it 

READFILE  I « I 

SINGLEFLAG  -'ON7  *S(  READONE) 

READAGAIN  ENTRY  BREAK (ENDTEXT)  . ENGDICT<I>  ENDTEXT 

BREAK! ENDTEXT)  . IPADICT<I>  iFCERRORINREAD) 
ENGDICT<I>  =*  BUNK  ENGDICT<I>  BLANK 
IPADICT<I>  * SLASH  BLANK  IPADICT<I>  BLANK  SUSH 
IPADICT<I>  POS(l)  DOUBLEBLANK  * BLANK 
IPADICT<I>  DOUBLEBUNK  RPOS(I)  » BLANK 

it 

* SEE  IF  INDEX  IS  GUT  OF  RANGE  OF  SIZE  OF  FILE. 

* 

I * LT(I,SIZE)  I + I IF(RETUEN)S(READAGAIN) 

* 

* NOTE  THAT  ENTRY  RESULTS  IN  ONE  RECORD  BEING  READ  FROM  THE 

* INPUT  FILE  WHICH  WAS  SPECIFIED  BY  THE  USER. 

Hr 

READONE  ENGDICT<I>  * BUNK  ENTRY 

ENGDICT<I>  ENDTEXT  = BLANK 
I - LT(IiSlZE)  I + 1 

* 

ERHORINREAO  OUTPUT  » ' EOF  IN  READING  * 

SIZE  » I - I 


iF(ERRORINREAD) 
•S(READONE)F( RETURN) 

«( RETURN) 
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aaa*a**a aaa aa*****aaa  a**********»*a********a*************a*a**** 

* 

FILEDEFINE 


* 

* 

* 

* 

******  a **a*  a*  ********  ■a**************a*******a*  a***  a ************* 
★ 


DEFINE  THE  ROUTINE  TO  GET  INPUT  AND  OUTPUT  FILE  NAMES. 


FILEDEFINE  INNO  » 22 

SINGLEFLAG  ■ -'OFF' 

SIZE  « DICTSIZE 
OUTNO  » 24 

OUTPUT  » -*  WHAT  IS  THE  DICTIONARY  FILE  NAME?  ' 

DICT  « INPUT 

* 

* SEE  IF  IT  IS  TTY. 

* 

DICT  TTY  *F(OKAY) 

INNO  « 2 

OUTPUT  * ' WHAT  IS  THE  SIZE  OF  THE  INPUT  FILE?-' 

SIZE  - INPUT 

* 

* THIS  RESULT  IS  NEEDED  FROM  THE  EXTERNAL  FILE  ALSO. 

* 

OKAY  IFILEt INNO, DICT) 

OUTPUT  « •'  IS  IT  AN  ENGLISH  AND  I PA  FILE?' 

INPUT  YESANS  iSCDEFINEOUT) 

SINGLEFUG  * -'ON-' 

DEFINEOUT  OUTPUT  « ' WHAT  IS  THE  FILENAME  FOR  THE  RESULTS?' 
OUTFILE  =•  INPUT 

OUTFILE  TTY  *F(FILE) 

OUTNO  * 2 

FILE  OUTFILE  POS(O)  OLDOUTFILE  RPOS(O)  *S(DEF) 

QLDOUTFILE  = OUTFILE 
ENDFILE(OUTNO) 

DEF  QFILE (OUTNO, OUTFILE) 

INPUT('ENTRY', INNO, 80) 

OUTPUT! 'RESULT', OUTNO, '( 1 X, I5A5)')  iCRETUHI) 

* 

* 

**************************************************************** 

* 

* 

* DEFINE  WHAT  HAPPENS  ON  NULL  REPONSE  TO  SEARCH  TYPE. 

* 

QUIT  OUTPUT  = ' WANT  TU  UUITY' 

INPUT  POS(O)  'N' 

ENDFILE(OUTNO) 

OUTPUT  = •*  ALL  DONE  ' 


»S<  REDEFINE) 


Appendix  C 

CONVERSION  OF  SOFTWARE  TO  FASBOL 

The  SNOBOL  processor  on  the  PDP-10  system  is  an  interpretive  implementation  of 
SNOBOL  4.*  Since  TRANS  is  itself  an  interpreter  for  the  letter-to-sound  rules,  when  it  is 
running  on  the  SNOBOL  processor,  it  suffers  from  all  the  inefficiency  one  would  expect 
of  an  interpreter  interpreting  an  interpreter.  TRANS  was  never  intended  for  production 
runs;  it  and  the  other  SNOBOL  programs  we  wrote  are  research  tools  to  facilitate  the 
development  of  the  letter-to-sound  rules.  We  were  therefore  prepared  to  pay  a price  in 
efficiency  for  the  convenience  of  working  in  a high-level  pattern-matching  language  and 
being  able  to  change  the  program  or  rules  easily.  We  could  not  remain  completely  indiffer- 
ent to  efficiency  however;  we  found  that  translating  a single  1000-word  sample  from  the 
Brown  Corpus  required  an  overnight  computer  run  of  many  hours.  When  a complied  version 
of  SNOBOL  became  available  to  us,  we  consequently  converted  our  software  to  take  advan- 
tage of  it. 

The  compiler,  FASBOL  II, t became  usable  on.  NRL’s  PDP-10  shortly  before  we  were 
ready  to  start  translating  large  samples  from  the  Brown  Corpus  with  version  3 of  the  rules. 
The  FASBOL  version  of  DICT  was  ready  soon  enough  to  be  used  in  part  of  the  work  on 
version  3,  and  none  of  the  third,  longest,  series  of  translations  of  large  samples  were  begun 
until  TRANS  had  been  converted.  STAT  was  converted  as  well. 

The  program  sections  that  open  and  close  input  and  output  files  had  to  be  rewritten, 
but  the  source  languages  for  the  SNOBOL  interpreter  and  the  FASBOL  compiler  are  com- 
patible enough  that  no  other  significant  changes  would  have  been  necessary.  We  made  some 
further  changes,  following  suggestions  in  the  FASBOL  manual!  for  enhancing  the  efficiency 
of  FASBOL  programs.  We  also  used  a FASBOL  feature  that  provides  a statement-by- 
statement analysis  of  execution  time;  once  we  had  identified  the  critical  statements,  we 
tried  rewriting  them  to  speed  up  the  programs  further.  This  last  attempt  met  with  such 
indifferent  success  as  only  to  reinforce  a conclusion  we  had  reached  working  with  SNOBOL: 
one’s  intuition  of  what  ought  to  be  fast  is  no  guide  to  what  is  fast. 

After  conversion  to  FASBOL,  TRANS  ran  about  25  times  as  fast  as  it  had  before. 
DICT,  while  simply  reading  words  from  a file  and  storing  them  in  an  array,  ran  35  times 
as  fast  after  conversion;  while  searching  the  array  for  the  words  that  match  a pattern,  it 
ran  3 to  8 times  as  fast  after  conversion.  These  speedup  factors  do  not  necessarily  reflect 
the  intrinsic  difference  in  speed  between  the  FASBOL  and  SNOBOL  systems,  since  some 
of  the  program  changes  might  well  have  increased  the  speed  of  the  SNOBOL  version  as  well; 
others  might  even  have  slowed  it  down.  Nevertheless  the  speed  increase  brought  with  it  a 
substantial  increase  in  convenience.  Translation  rates  are  up  from  one  word  every  half- 
minute or  minute  to  one  word  every  second  or  two.  This  is  within  a factor  of  4 or  5 of 
real-time  speech  rates,  and  an  implementation  designed  for  effciency  rather  than  conve- 
nient experimentation,  by  stripping  away  another  layer  of  interpretive  overhead,  would 
certainly  run  much  faster  than  that. 

♦R.E.  Griswold,  J.F.  Poage,  and  I.P,  Polonsky,  The  SNOBOL  4 Programming  Language,  Prentice-Hall, 
Englewood  Cliffs,  N.J.,  2nd  edition,  1971. 

+P.J.  Santos,  Jr.,  “FASBOL  II,  a SNOBOL  Compiler  for  the  PDP-10,”  DECUS  No.  10-179,  Digital 
Equipment  Computer  Users’  Society,  Dec.  1972. 
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Not  only  did  conversion  to  FASBOL  increase  our  programs’  speed,  it  reduced  their 
memory  requirements,  in  some  cases  threefold.  It  thus  became  possible  to  run  DICT  on 
much  larger  word  lists  than  before. 

We  are  reproducing  the  SNOBOL  versions  of  the  programs  in  this  report,  since  SNOBOL 
4 interpreters  are  more  widely  available  than  the  FASBOL  compiler.  The  FASBOL  versions 
are  available  from  the  authors. 


